finding

active

finding:pca-visualizations-of-llama-2-13b-and-70b-representations-of-curated-datasets-show-clear-linear-structure-with-true-statements-separating-from-false-ones-in-the-top-two-principal-components

PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal components

Primary visual evidence for linear truth representations in large LLMs

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Claims (1)

claim

LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasets
associated_withsupports
Establishes that the observed linear structure is not merely a representation of text probability

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth valuefinding0.842
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
Linear structure in LLaMA-2-13B representations emerges rapidly in early-middle layers, later for conjunctive statementsfinding0.820
Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass
In LLaMA-2-13B, salient linear structure in the top PCs rapidly emerges in early-middle layers, with this emergence occurring later for conjunctive statements than simple statementsfinding0.813
Layer-wise emergence pattern supporting hierarchical development hypothesis
In LLaMA-2-13B, cities and neg_cities show approximately orthogonal axes of separation in PCA visualizations at intermediate layersfinding0.805
Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.787
Demonstrates that small models represent surface features rather than abstract truth
In LLaMA-2-13B, larger_than and smaller_than separate along antipodal directions in PCA; in LLaMA-2-70B they align along a common directionfinding0.771
Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.766
Central interpretive claim of the paper supported by causal ablation and activation evidence
LLaMA-2-70B displays summarization behavior over punctuation tokens in a context-dependent way: present for cities but not for sp_en_transfinding0.766
Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale