finding
active
finding:wellbeing-same-concept-steering-lmm-alpha-slope-0-19-focus-0-40-interest-0-25-impulsivity-0-067-in-llama-3-2-3bWellbeing same-concept steering: LMM alpha slope=0.19, focus=0.40, interest=0.25, impulsivity=0.067 in LLaMA-3.2-3B
Quantifies per-concept effect size of same-concept steering on self-report
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Causal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report
- Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
- Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
- LLaMA-3.1-8B-Instruct wellbeing introspection: ρ=0.93, isotonic R²=0.90 (LMM probe slope p<10⁻¹⁰)finding0.837Near-ceiling introspective performance for wellbeing concept in 8B model; nearly deterministic probe-report relationship
- Internal-state drift generalizes across scales; normalized drift also increases significantly with log(model size)
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
- Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.812Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation
- Third-strongest pooled introspective coupling in primary model