finding
active
finding:wellbeing-probe-drift-is-positive-in-gemma-0-34-pooled-turn-correlation-and-qwen-0-24-both-p-10-5Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵
Normalized probe-score drift across turns generalizes beyond LLaMA family
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
- Weaker cross-family probe; explains weaker introspection in Gemma
- Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
- Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
- Probe validation result confirming wellbeing direction captures meaningful structure
- Strongest cross-family probe; explains clearer introspection in Qwen than Gemma
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.773Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship