finding
active
finding:same-concept-steering-shifts-self-report-monotonically-for-all-four-concepts-lmm-alpha-slopes-0-067-0-40-all-p-10-12Same-concept steering shifts self-report monotonically for all four concepts: LMM alpha slopes 0.067–0.40, all p<10⁻¹²
Causal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Claims (2)
claim
- The coupling between LLM self-report and internal emotive state is causal, not merely correlationalassociated_withsupportsSupported by same-concept steering experiments showing monotonic shifts in self-report with activation steering
- Addresses skeptical alternative that reports reflect only conversational content
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Quantifies per-concept effect size of same-concept steering on self-report
- Unlike probe drift, report drift magnitude does not follow a clean scaling law; size-slope is negative
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
- Second significant cross-concept introspection improvement; marginal after BH correction (q≈0.066)
- Key methodological insight: introspection enables a new probe validation criterion beyond conventional separation metrics
- Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
- Interest probe score drifts positively across turns: LMM slope=0.005, p=4.12×10⁻¹⁴ in LLaMA-3.2-3Bfinding0.764Demonstrates genuine internal-state dynamics in LLMs during multi-turn conversation
- The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.