finding

active

finding:qwen-2-5-7b-instruct-wellbeing-introspection-0-49-isotonic-r2-0-76-lmm-p-10-10

Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)

Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Claims (1)

claim

Numeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methods
supports
Central practical conclusion; both methods partially track the same latent state but with different failure modes

Findings (1)

finding

Qwen 2.5 7B wellbeing probe: peak Cohen's d=3.5
supports
Strongest cross-family probe; explains clearer introspection in Qwen than Gemma

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.898
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing introspection improves from 1B to 3B: ρ from 0.48 to 0.66, R² from 0.26 to 0.45finding0.861
Confirms scaling trend for wellbeing concept between smallest and middle model size
Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)finding0.857
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.835
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.824
Second-strongest pooled introspective coupling in primary model
Qwen 2.5 7B turn-wise introspective fidelity: strong at turn 1 (R²≈0.90) but declines significantly to turn 10 (∆R²=-0.44, p=0.001)finding0.807
Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
LLaMA-3.2-1B impulsivity introspection: ρ=0.21, p<10⁻⁴ (significant but weaker than 3B ρ=0.52)finding0.806
Impulsivity shows significant introspection in 1B but declines in 8B; non-monotonic scaling
Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹finding0.796
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs