finding

active

finding:in-llama-2-7b-pca-of-larger-than-smaller-than-shows-statements-clustering-by-surface-level-characteristics-e-g-presence-of-token-eighty-rather-than-truth-value

In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth value

Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Claims (1)

claim

As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputs
supports
Interpretive claim connecting scale to abstraction level in LLM representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.887
Demonstrates that small models represent surface features rather than abstract truth
PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal componentsfinding0.842
Primary visual evidence for linear truth representations in large LLMs
In LLaMA-2-13B, larger_than and smaller_than separate along antipodal directions in PCA; in LLaMA-2-70B they align along a common directionfinding0.810
Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
LLaMA-2-70B displays summarization behavior over punctuation tokens in a context-dependent way: present for cities but not for sp_en_transfinding0.789
Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale
PCA analysis shows token embeddings and unembeddings are concentrated in a relatively small fraction of residual stream dimensions in large modelsfinding0.787
Supporting evidence for the claim that most residual stream dimensions are free for other layers to use
In LLaMA-2-13B, salient linear structure in the top PCs rapidly emerges in early-middle layers, with this emergence occurring later for conjunctive statements than simple statementsfinding0.783
Layer-wise emergence pattern supporting hierarchical development hypothesis
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.780
Central interpretive claim of the paper supported by causal ablation and activation evidence
Patching group (b) hidden states (over clause-ending punctuation, early-middle layers) in LLaMA-2-13B produces the strongest causal effect on TRUE/FALSE output predictionsfinding0.770
Localizes truth representations to specific hidden states, motivating the rest of the analysis