finding

active

finding:wellbeing-probe-drift-is-positive-in-gemma-0-34-pooled-turn-correlation-and-qwen-0-24-both-p-10-5

Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵

Normalized probe-score drift across turns generalizes beyond LLaMA family

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.859
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Gemma 3 4B-IT wellbeing introspection: ρ=0.28, isotonic R²=0.11 (LMM p=1.33×10⁻¹³)finding0.808
Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
Gemma 3 4B wellbeing probe: peak Cohen's d=1.8finding0.797
Weaker cross-family probe; explains weaker introspection in Gemma
Logit self-report drift positive for all three LLaMA sizes (turn slopes 0.159, 0.038, 0.141; all p<10⁻²⁰) but does not increase monotonically with scalefinding0.786
Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
Qwen 2.5 7B turn-wise introspective fidelity: strong at turn 1 (R²≈0.90) but declines significantly to turn 10 (∆R²=-0.44, p=0.001)finding0.778
Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
Wellbeing probe: peak Cohen's d=3.34 (layer 16), p=7.21×10⁻¹³ in LLaMA-3.2-3Bfinding0.778
Probe validation result confirming wellbeing direction captures meaningful structure
Qwen 2.5 7B wellbeing probe: peak Cohen's d=3.5finding0.776
Strongest cross-family probe; explains clearer introspection in Qwen than Gemma
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.773
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship