finding
active
finding:linear-structure-in-llama-2-13b-representations-emerges-rapidly-in-early-middle-layers-later-for-conjunctive-statementsLinear structure in LLaMA-2-13B representations emerges rapidly in early-middle layers, later for conjunctive statements
Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Offered to explain pattern observed in App.C layer-by-layer PCA analysis
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Layer-wise emergence pattern supporting hierarchical development hypothesis
- Primary visual evidence for linear truth representations in large LLMs
- The representation geometry finding that motivates the question about whether computation mirrors it
- Hypothesized intermediate feature explaining antipodal alignment between cities and neg_cities in early-middle layers
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.787Task-specific E3 finding showing compositional reasoning requires deeper processing
- Demonstrates that small models represent surface features rather than abstract truth
- Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations
- Key empirical result showing that optimizing for behavioral outputs and fitting representation geometry produce the same path in activation space.