finding
active
finding:llama-3-1-8b-instruct-wellbeing-introspection-0-93-isotonic-r2-0-90-lmm-probe-slope-p-10-10LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (1)
claim
- Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity
- LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.862Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
- Demonstrates introspection is present from the first conversation turn without needing multi-turn context
- Second-strongest pooled introspective coupling in primary model
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
- Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
- Quantifies per-concept effect size of same-concept steering on self-report
- Probe validation result confirming wellbeing direction captures meaningful structure