finding
active
finding:8-layer-nonlin-achieves-near-perfect-iia-on-pythia-410m-at-all-training-steps-including-random-initialisation-on-ioi-task8-layer ϕ_nonlin achieves near-perfect IIA on Pythia-410m at all training steps including random initialisation on IOI task
Training progression result showing non-linear maps are uncorrelated with genuine task learning
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Claims (2)
claim
- Central thesis of the paper
- Empirical support for vacuousness of unrestricted causal abstraction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Confirms theorem's existence proof holds but practical learnability fails with insufficient RevNet capacity
- Linear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI taskfinding0.775Suggests linear maps may be better correlated with genuine task implementation than non-linear maps
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
- Robustness check across seeds showing occasional failures of alignment map training
- Authors' tentative hypothesis from Fig. 4 but they acknowledge they cannot formalise this intuition
- Baseline accuracy showing small models fail on harder NPI licensing tasks
- Corroborating result on additional task confirming main paper findings
- Methodological limitation disproportionately affecting the largest MoE model, constraining generalizability.