finding
active
finding:in-llama-2-13b-cities-and-neg-cities-show-antipodal-alignment-in-early-layers-rotate-to-orthogonal-in-middle-layers-then-eventually-align-in-later-layersIn LLaMA-2-13B, cities and neg_cities show antipodal alignment in early layers, rotate to orthogonal in middle layers, then eventually align in later layers
Layer-by-layer evolution of truth direction alignment, supporting hierarchical abstraction hypothesis
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
- Hypothesized intermediate feature explaining antipodal alignment between cities and neg_cities in early-middle layers
- Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
- Scale-dependent structural finding from PCA visualizations in §4
- Layer-wise geometry shows early dip, mid-layer alignment, and late standardization across tasksclaim0.784Qualitative pattern from E3.
- Demonstrates strong anti-correlation between text probability and truth in negated datasets
- Key empirical result showing that optimizing for behavioral outputs and fitting representation geometry produce the same path in activation space.
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.762Task-specific E3 finding showing compositional reasoning requires deeper processing