claim
active
claim:as-llms-scale-they-develop-increasingly-general-abstractions-with-large-models-linearly-representing-abstract-concepts-like-truth-that-capture-shared-properties-of-diverse-inputsAs LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputs
Interpretive claim connecting scale to abstraction level in LLM representations
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (6)
finding
- LLaMA-2-70B and 13B probes generalize better across datasets than 7B probes across all training sets and probe typesassociated_withsupportsLarger models linearly represent more general concepts including truth
- LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'associated_withsupportsDemonstrates that small models represent surface features rather than abstract truth
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
- Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
- Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
- Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
Concepts (1)
concept
- Antipodal Alignment of Truth Directionsassociated_withThe case where two datasets (e.g., larger_than and smaller_than) separate along opposite directions in PCA, indicating a shared feature with opposite sign
Claims (1)
claim
- Establishes that the observed linear structure is not merely a representation of text probability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Theoretical interpretation of antipodal alignment and misalignment phenomena in PCA visualizations
- Interpretation of weaker PCA separation and lower ASR in smaller models
- Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
- Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.839Central research question driving dataset design and experimental approach
- Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
- The authors' interpretive assertion based on their steering results.
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
- Interpretive synthesis of DIM and cone intervention successes