finding
active
finding:llama-2-70b-and-13b-probes-generalize-better-across-datasets-than-7b-probes-across-all-training-sets-and-probe-typesLLaMA-2-70B and 13B probes generalize better across datasets than 7B probes across all training sets and probe types
Larger models linearly represent more general concepts including truth
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- As LLMs scale, they develop increasingly general abstractions, with large models linearly representing abstract concepts like truth that capture shared properties of diverse inputsassociated_withsupportsInterpretive claim connecting scale to abstraction level in LLM representations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Despite being simpler and optimization-free, MM probes match accuracy of other techniques at scale
- Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
- Model-specific difference in persona susceptibility
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.812Central interpretive claim of the paper supported by causal ablation and activation evidence
- Shows behavioral pattern of self-correction is trainable in smaller models
- 26 candidate off-topic detector latents identified in Llama-3.3-70B via contrastive searchfinding0.803Core mechanistic finding identifying specific SAE latents associated with ESR