finding

active

finding:wellbeing-same-concept-steering-lmm-alpha-slope-0-19-focus-0-40-interest-0-25-impulsivity-0-067-in-llama-3-2-3b

Wellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3B

Quantifies per-concept effect size of same-concept steering on self-report

Source paper

extracted_from

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

(2026) · Nicolas Martorell · Bianchi, Bruno

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Same-concept steering shifts self-report monotonically for all four concepts: LMM alpha slopes 0.067–0.40, all p<10⁻¹²finding0.867
Causal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report
Cross-concept steering: impulsivity→interest R² increases from 0.55 (α=-4) to 0.72 (α=+4), ∆R²=0.10, p=0.012 in LLaMA-3.2-3Bfinding0.848
Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
Cross-concept steering: focus→wellbeing R² increases from 0.30 (α=-4) to 0.76 (α=+4), ∆R²=0.30, p<0.001 in LLaMA-3.2-3Bfinding0.842
Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.837
Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
Wellbeing probe-score drift across turns significant at all three LLaMA scales (slopes=0.006, 0.005, 0.013 for 1B, 3B, 8B; all p<10⁻¹⁰); drift magnitude increases with scalefinding0.837
Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
Impulsivity→interest steering: probe entropy increases (LMM slope=0.024, p=2.30×10⁻⁴) but report entropy does not (p=0.11)finding0.813
Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.812
Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation
Impulsivity concept: Spearman ρ=0.51, isotonic R²=0.31 in LLaMA-3.2-3B (n=400, p<10⁻¹²)finding0.809
Third-strongest pooled introspective coupling in primary model