claim

active

claim:as-llms-scale-they-develop-increasingly-general-abstractions-with-large-models-linearly-representing-abstract-concepts-like-truth-that-capture-shared-properties-of-diverse-inputs

As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputs

Interpretive claim connecting scale to abstraction level in LLM representations

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Papers (1)

paper

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
introduces

Findings (6)

finding

LLaMA-2-70B and 13B probes generalize better across datasets than 7B probes across all training sets and probe types
associated_withsupports
Larger models linearly represent more general concepts including truth
LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'
associated_withsupports
Demonstrates that small models represent surface features rather than abstract truth
For LLaMA-2-70B, probes trained on larger_than+smaller_than achieve >95% accuracy on sp_en_trans regardless of probing technique
supports
Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
In LLaMA-2-13B, cities and neg_cities show approximately orthogonal axes of separation in PCA visualizations at intermediate layers
supports
Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
In LLaMA-2-13B, larger_than and smaller_than separate along antipodal directions in PCA; in LLaMA-2-70B they align along a common direction
supports
Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth value
supports
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim

Concepts (1)

concept

Antipodal Alignment of Truth Directions
associated_with
The case where two datasets (e.g., larger_than and smaller_than) separate along opposite directions in PCA, indicating a shared feature with opposite sign

Claims (1)

claim

LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasets
extends
Establishes that the observed linear structure is not merely a representation of text probability

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In intermediate regimes of scale or layer depth, LLMs may linearly represent features at intermediate levels of abstraction such as 'accurate factual recall' or 'close association' rather than abstract truthclaim0.879
Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations
Representational abstraction of truth may emerge more clearly with model scaleclaim0.849
Interpretation of weaker PCA separation and lower ASR in smaller models
LLMs hierarchically develop understanding of their input data, progressing from surface-level features in early layers to more abstract concepts in later layersclaim0.840
Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.839
Central research question driving dataset design and experimental approach
It is plausible that ongoing developments in LLMs may lead to models or agentic systems built on LLMs capable of generating representations observed with 'consciousness' phenomena.claim0.828
Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
LLMs internalize deeply integrated representations of high-order concepts.claim0.818
The authors' interpretive assertion based on their steering results.
LLM representations exhibit intriguing patterns under spatio-permutational analyses, suggesting a potentially profound yet tentative indication of consciousness.claim0.813
Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
Truth may be linearly separable in the model's representation space, but the structure is richer than a single linear axisclaim0.807
Interpretive synthesis of DIM and cone intervention successes