finding
active
finding:across-5-pythia-seeds-one-seed-fails-to-learn-ioi-task-and-another-fails-alignment-despite-learning-the-task-all-other-seeds-achieve-perfect-alignment-with-nonlinAcross 5 Pythia seeds, one seed fails to learn IOI task and another fails alignment despite learning the task; all other seeds achieve perfect alignment with ϕ_nonlin
Robustness check across seeds showing occasional failures of alignment map training
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Training progression result showing non-linear maps are uncorrelated with genuine task learning
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation
- Attributed to model anisotropy from saturation making hidden states harder to access
- Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism
- Linear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI taskfinding0.745Suggests linear maps may be better correlated with genuine task implementation than non-linear maps
- NPI licensing mechanism in pythia-1b emerges in discrete stages (steps 1000, 2000, 3000) not graduallyfinding0.745Training dynamics finding showing abrupt rather than gradual emergence of NPI mechanism
- Baseline accuracy showing small models fail on harder NPI licensing tasks
- Shows the passive vs. active divide is more important than the specific wording of instructions.