finding

active

finding:interest-introspection-improves-from-1b-to-3b-from-0-19-to-0-80-r2-from-0-05-to-0-66

Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66

Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Introspective capacity scales with model size for some concepts, approaching near-perfect coupling in LLaMA-3.1-8B
supports
Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45finding0.881
Confirms scaling trend for wellbeing concept between smallest and middle model size
LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.814
Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.793
Scatter plot visualization showing strengthened probe-report relationship across alpha range
Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)finding0.792
Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹finding0.791
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)finding0.786
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Focus→wellbeing: ρ increases from 0.42 (α=-4) to 0.85 (α=+4); R² from 0.34 to 0.75 in LLaMA-3.2-3Bfinding0.778
Scatter plot visualization of the dramatic tightening of probe-report relationship at extreme steering settings
Pearson-Vogel et al.: accurate self-description prompts increase introspective detection from 0.3% to 39.9%finding0.776
Cited to mechanistically support why the contemplative prompt changes what post-training-shaped final layers allow through