finding

active

finding:interest-probe-score-drifts-positively-across-turns-lmm-slope-0-005-p-4-12-10-14-in-llama-3-2-3b

Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3B

Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Concepts (1)

concept

Emotive states in LLMs
supports
Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.859
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Logit self-report drift positive for all three LLaMA sizes (turn slopes 0.159, 0.038, 0.141; all p<10⁻²⁰) but does not increase monotonically with scalefinding0.815
Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3Bfinding0.812
Quantifies per-concept effect size of same-concept steering on self-report
Impulsivity→interest steering: probe entropy increases (LMM slope=0.024, p=2.30×10⁻⁴) but report entropy does not (p=0.11)finding0.790
Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
Interest probe: peak Cohen's d=1.67 (layer 14), p=9.45×10⁻⁶ in LLaMA-3.2-3Bfinding0.789
Probe validation result confirming interest direction captures meaningful structure
Interest concept: Spearman ρ=0.76, isotonic R²=0.54 between logit self-report and probe score in LLaMA-3.2-3B (n=400)finding0.783
Strongest pooled introspective coupling across the four emotive concepts in the primary model
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.767
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Same-concept steering shifts self-report monotonically for all four concepts: LMM alpha slopes 0.067–0.40, all p<10⁻¹²finding0.764
Causal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report