claim
active
claim:introspective-capacity-scales-with-model-size-for-some-concepts-approaching-near-perfect-coupling-in-llama-3-1-8bIntrospective capacity scales with model size for some concepts, approaching near-perfect coupling in LLaMA-3.1-8B
Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (5)
finding
- Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹associated_withsupportsStrong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
- LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)contradictsImpulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
- Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)supportsNear-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Confirms scaling trend for wellbeing concept between smallest and middle model size
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Practical bottleneck explaining why these phenomena are not widely studied.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Introspective capabilities appear only in very large models (>70B), with 70B barely on the threshold; bottleneck for independent research.
- Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.833The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only
- A caveat qualifying the main claim.
- Contradicts expectation from emergent abilities literature; however, interpreted cautiously due to methodological limitations.
- Demonstrates introspection is present from the first conversation turn without needing multi-turn context
- Is introspection an emergent property of scale, or do smaller open-weight models exhibit similar capabilities?question0.803Motivates comparison of Llama 3.1 8B results against Lindsey's frontier model findings