finding
active
finding:mean-validated-introspective-fidelity-across-concept-model-pairs-r2-0-12-1b-0-37-3b-0-61-8b-pooled-lmm-0-29-p-5-55-10-99Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (1)
claim
- Introspective capacity scales with model size for some concepts, approaching near-perfect coupling in LLaMA-3.1-8Bassociated_withsupportsValidated for wellbeing and interest; focus and impulsivity do not show consistent scaling
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
- Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.795Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.791Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest
- Impulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3Bfinding0.784Opposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity
- LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.779Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
- Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45finding0.777Confirms scaling trend for wellbeing concept between smallest and middle model size