claim

active

claim:in-intermediate-regimes-of-scale-or-layer-depth-llms-may-linearly-represent-features-at-intermediate-levels-of-abstraction-such-as-accurate-factual-recall-or-close-association-rather-than-abstract-truth

In intermediate regimes of scale or layer depth, LLMs may linearly represent features at intermediate levels of abstraction such as 'accurate factual recall' or 'close association' rather than abstract truth

Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Concepts (1)

concept

Close Association Feature
supports
A hypothesized intermediate-level linearly-represented feature (e.g., Beijing and China are closely associated) that may correlate with truth in unnegated datasets but anti-correlate in negated ones

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputsclaim0.879
Interpretive claim connecting scale to abstraction level in LLM representations
LLMs hierarchically develop understanding of their input data, progressing from surface-level features in early layers to more abstract concepts in later layersclaim0.848
Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.845
Establishes that the observed linear structure is not merely a representation of text probability
Linear truth directions in LLMs are reliable primarily in factual recall cases and break down when truth assessment depends on computing and storing intermediate results.claim0.823
Central empirical conclusion of the paper about the fundamental limits of truth directions.
Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.817
Central research question driving dataset design and experimental approach
We hypothesize that the layer-wise emergence of linear structure is due to LLMs hierarchically developing understanding of their input data, progressing from surface level features to more abstract conceptshypothesis0.816
Stated explicitly in App. C to explain why linear structure emerges later for conjunctive statements
LLM representations exhibit intriguing patterns under spatio-permutational analyses, suggesting a potentially profound yet tentative indication of consciousness.claim0.810
Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
Representational abstraction of truth may emerge more clearly with model scaleclaim0.810
Interpretation of weaker PCA separation and lower ASR in smaller models