finding
active
finding:linear-alignment-map-lin-iia-tracks-dnn-accuracy-during-pythia-410m-training-progression-on-ioi-taskLinear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI task
Suggests linear maps may be better correlated with genuine task implementation than non-linear maps
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- The fact that ϕ_lin tracks DNN performance more closely than ϕ_nonlin throughout training may support the linear representation hypothesis for IOI task featuresassociated_withsupportsAuthors' tentative hypothesis from Fig. 4 but they acknowledge they cannot formalise this intuition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Replicates Geiger et al. 2024b pattern of layer-dependent IIA degradation with linear maps
- Demonstrates that high IIA can be obtained even when model cannot solve the task
- Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
- Attributed to model anisotropy from saturation making hidden states harder to access
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation
- Training progression result showing non-linear maps are uncorrelated with genuine task learning
- Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
- Linear probe achieves 100% classification accuracy for almost all components in Pythia-6.9B gender taskfinding0.760Demonstrates that linear probes can overestimate causal relevance; probes succeed on non-causally-relevant representations