finding

active

finding:linear-structure-in-llama-2-13b-representations-emerges-rapidly-in-early-middle-layers-later-for-conjunctive-statements

Linear structure in LLaMA-2-13B representations emerges rapidly in early-middle layers, later for conjunctive statements

Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

We hypothesize that the layer-dependent emergence of linear structure is due to LLMs hierarchically developing understanding of input data, progressing from surface features to more abstract concepts
supports
Offered to explain pattern observed in App.C layer-by-layer PCA analysis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In LLaMA-2-13B, salient linear structure in the top PCs rapidly emerges in early-middle layers, with this emergence occurring later for conjunctive statements than simple statementsfinding0.899
Layer-wise emergence pattern supporting hierarchical development hypothesis
PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal componentsfinding0.820
Primary visual evidence for linear truth representations in large LLMs
Llama-3.1-8B representations for cyclic concepts are circularly structuredfinding0.814
The representation geometry finding that motivates the question about whether computation mirrors it
In early layers, LLaMA-2-13B represents a 'close association' feature that correlates with truth on cities but anti-correlates on neg_citiesclaim0.808
Hypothesized intermediate feature explaining antipodal alignment between cities and neg_cities in early-middle layers
Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.787
Task-specific E3 finding showing compositional reasoning requires deeper processing
LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.784
Demonstrates that small models represent surface features rather than abstract truth
In intermediate regimes of scale or layer depth, LLMs may linearly represent features at intermediate levels of abstraction such as 'accurate factual recall' or 'close association' rather than abstract truthclaim0.780
Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations
The representation-based path and the behavior-based path in Llama-3.1 8B activation space trace out similar curves, demonstrating bidirectional geometry alignment.finding0.780
Key empirical result showing that optimizing for behavioral outputs and fitting representation geometry produce the same path in activation space.