finding

active

finding:llama-3-1-8b-instruct-wellbeing-introspection-0-93-isotonic-r2-0-90-lmm-probe-slope-p-10-10

LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)

Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Introspective capacity scales with model size for some concepts, approaching near-perfect coupling in LLaMA-3.1-8B
supports
Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)finding0.898
Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.862
Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.854
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.848
Second-strongest pooled introspective coupling in primary model
Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)finding0.841
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.837
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.837
Quantifies per-concept effect size of same-concept steering on self-report
Wellbeing probe: peak Cohen's d=3.34 (layer 16), p=7.21×10⁻¹³ in LLaMA-3.2-3Bfinding0.827
Probe validation result confirming wellbeing direction captures meaningful structure