Qwen 2.5 7B wellbeing probe: peak Cohen's d=3.5

Strongest cross-family probe; explains clearer introspection in Qwen than Gemma

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Findings (1)

finding

Qwen 2.5 7B-Instruct wellbeing introspection: ρ=0.49, isotonic R²=0.76 (LMM p<10⁻¹⁰)
supports
Strong introspective coupling in Qwen model; demonstrates cross-family generalization of introspective capacity

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Gemma 3 4B wellbeing probe: peak Cohen's d=1.8finding0.887
Weaker cross-family probe; explains weaker introspection in Gemma
Wellbeing probe: peak Cohen's d=3.34 (layer 16), p=7.21×10⁻¹³ in LLaMA-3.2-3Bfinding0.883
Probe validation result confirming wellbeing direction captures meaningful structure
Interest probe: peak Cohen's d=1.67 (layer 14), p=9.45×10⁻⁶ in LLaMA-3.2-3Bfinding0.807
Probe validation result confirming interest direction captures meaningful structure
Impulsivity probe: peak Cohen's d=3.60 (layer 13), p=3.58×10⁻¹³ in LLaMA-3.2-3Bfinding0.788
Strongest probe validation result; highest Cohen's d among the four concepts
Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.776
Normalized probe-score drift across turns generalizes beyond LLaMA family
Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.766
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.765
Parameters don't predict scores; 135x more parameters yields 60% lower score
Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5finding0.761
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality