claim

active

claim:introspective-capacity-scales-with-model-size-for-some-concepts-approaching-near-perfect-coupling-in-llama-3-1-8b

Introspective capacity scales with model size for some concepts, approaching near-perfect coupling in LLaMA-3.1-8B

Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Findings (5)

finding

Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹
associated_withsupports
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)
contradicts
Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66
supports
Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)
supports
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45
supports
Confirms scaling trend for wellbeing concept between smallest and middle model size

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspective capabilities have threshold effects requiring very large models; 70B models are barely on the threshold, and independent researchers lack access to larger models.claim0.839
Practical bottleneck explaining why these phenomena are not widely studied.
We hypothesize that introspective capabilities may scale with model size and architecture, including recurrence/looping that extends the integration windowhypothesis0.834
Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
model size threshold for introspectionconcept0.833
Introspective capabilities appear only in very large models (>70B), with 70B barely on the threshold; bottleneck for independent research.
Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.833
The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only
This introspective capacity is highly unreliable and context-dependent in today's modelsclaim0.820
A caveat qualifying the main claim.
No significant disparity in potential consciousness indicators was found between larger models (Mixtral-8x7B, LLaMA3.1-70B) and smaller counterparts (Mistral-7B, LLaMA3.1-8B).finding0.813
Contradicts expectation from emergent abilities literature; however, interpreted cautiously due to methodological limitations.
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.805
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Is introspection an emergent property of scale, or do smaller open-weight models exhibit similar capabilities?question0.803
Motivates comparison of Llama 3.1 8B results against Lindsey's frontier model findings