finding
active
finding:pca-visualizations-of-llama-2-13b-and-70b-representations-of-curated-datasets-show-clear-linear-structure-with-true-statements-separating-from-false-ones-in-the-top-two-principal-componentsPCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal components
Primary visual evidence for linear truth representations in large LLMs
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsassociated_withsupportsEstablishes that the observed linear structure is not merely a representation of text probability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
- Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass
- Layer-wise emergence pattern supporting hierarchical development hypothesis
- Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
- Demonstrates that small models represent surface features rather than abstract truth
- Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.766Central interpretive claim of the paper supported by causal ablation and activation evidence
- Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale