claim
active
claim:the-coupling-between-llm-self-report-and-internal-emotive-state-is-causal-not-merely-correlationalThe coupling between LLM self-report and internal emotive state is causal, not merely correlational
Supported by same-concept steering experiments showing monotonic shifts in self-report with activation steering
Source paper
extracted_from(2026) · Nicolas Martorell · Bianchi, Bruno
Neighborhood — ranked by edge-count
Findings (1)
finding
- Same-concept steering shifts self-report monotonically for all four concepts: LMM alpha slopes 0.067–0.40, all p<10⁻¹²associated_withsupportsCausal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report
Methods (1)
method
- Same-concept steeringassociated_withSteering using the same concept direction as is being measured, testing whether internal-state shifts causally affect the model's report of that state
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central practical conclusion; both methods partially track the same latent state but with different failure modes
- We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.794Open hypothesis from the Anthropic paper that motivates this work
- Load-bearing operational definition that distinguishes the paper's framework from prior approaches
- Skeptical prior work motivating the need to validate self-reports against internal states rather than taking them at face value
- Directions in activation space associated with contrastive emotive concept pairs studied in this paper as targets for introspection
- The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
- The core interpretive question the paper narrows but cannot definitively answer
- Central interpretive claim of the paper supported by multiple convergent analyses