finding

active

finding:wellbeing-probe-score-drift-across-turns-significant-at-all-three-llama-scales-slopes-0-006-0-005-0-013-for-1b-3b-8b-all-p-10-10-drift-magnitude-increases-with-scale

Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scale

Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Concepts (1)

concept

Persona drift
extends
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.859
Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation
Wellbeing probe drift is positive in Gemma (ρ=0.34 pooled turn-correlation) and Qwen (ρ=0.24); both p<10⁻⁵finding0.859
Normalized probe-score drift across turns generalizes beyond LLaMA family
Logit self-report drift positive for all three LLaMA sizes (turn slopes 0.159, 0.038, 0.141; all p<10⁻²⁰) but does not increase monotonically with scalefinding0.844
Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.837
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.837
Quantifies per-concept effect size of same-concept steering on self-report
Wellbeing probe: peak Cohen's d=3.34 (layer 16), p=7.21×10⁻¹³ in LLaMA-3.2-3Bfinding0.827
Probe validation result confirming wellbeing direction captures meaningful structure
Wellbeing introspective strength at turn 1: ρ=0.52, p=5.46×10⁻⁴ in LLaMA-3.2-3Bfinding0.814
Demonstrates introspection is present from the first conversation turn without needing multi-turn context
Wellbeing concept: Spearman ρ=0.68, isotonic R²=0.48 in LLaMA-3.2-3B (n=400, p<10⁻²⁶)finding0.791
Second-strongest pooled introspective coupling in primary model