hypothesis
active
hypothesis:the-fact-that-lin-tracks-dnn-performance-more-closely-than-nonlin-throughout-training-may-support-the-linear-representation-hypothesis-for-ioi-task-featuresThe fact that ϕ_lin tracks DNN performance more closely than ϕ_nonlin throughout training may support the linear representation hypothesis for IOI task features
Authors' tentative hypothesis from Fig. 4 but they acknowledge they cannot formalise this intuition
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Findings (1)
finding
- Linear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI taskassociated_withsupportsSuggests linear maps may be better correlated with genuine task implementation than non-linear maps
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interpretive claim about what linear DAS results actually tell us
- Corroborating result on additional task confirming main paper findings
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation
- Confirms theorem's existence proof holds but practical learnability fails with insufficient RevNet capacity
- Key empirical result: non-linear maps overcome linear maps' failure in deeper layers
- Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.767Foundation for interpreting features as linear directions.
- Training progression result showing non-linear maps are uncorrelated with genuine task learning
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.752Explanation for why dictionary learning can recover many more features than dimensions.