finding

active

finding:in-llama-2-13b-larger-than-and-smaller-than-separate-along-antipodal-directions-in-pca-in-llama-2-70b-they-align-along-a-common-direction

In LLaMA-2-13B, larger_than and smaller_than separate along antipodal directions in PCA; in LLaMA-2-70B they align along a common direction

Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Claims (2)

claim

As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputs
supports
Interpretive claim connecting scale to abstraction level in LLM representations
Antipodal alignment between related datasets (e.g., larger_than and smaller_than) in smaller models resolves to common-direction alignment in larger models
supports
Scale-dependent structural finding from PCA visualizations in §4

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth valuefinding0.810
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
In LLaMA-2-13B, cities and neg_cities show antipodal alignment in early layers, rotate to orthogonal in middle layers, then eventually align in later layersfinding0.805
Layer-by-layer evolution of truth direction alignment, supporting hierarchical abstraction hypothesis
LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.804
Demonstrates that small models represent surface features rather than abstract truth
In LLaMA-2-13B, cities and neg_cities show approximately orthogonal axes of separation in PCA visualizations at intermediate layersfinding0.798
Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
In LLaMA-2-13B, salient linear structure in the top PCs rapidly emerges in early-middle layers, with this emergence occurring later for conjunctive statements than simple statementsfinding0.779
Layer-wise emergence pattern supporting hierarchical development hypothesis
For LLaMA-2-70B, probes trained on larger_than+smaller_than achieve >95% accuracy on sp_en_trans regardless of probing techniquefinding0.775
Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal componentsfinding0.771
Primary visual evidence for linear truth representations in large LLMs
The representation-based path and the behavior-based path in Llama-3.1 8B activation space trace out similar curves, demonstrating bidirectional geometry alignment.finding0.763
Key empirical result showing that optimizing for behavioral outputs and fitting representation geometry produce the same path in activation space.