claim
active
claim:emotion-features-in-llms-are-genuinely-more-persistent-than-variance-matched-random-features-indicating-stateful-emotional-encoding-beyond-autoregressive-dynamicsEmotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamics
Central interpretive claim of the paper supported by multiple convergent analyses
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Findings (5)
finding
- Demonstrates that SAE features more aligned with the emotion subspace are more persistent in Cogito after variance control
- Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
- Quantifies emotion feature persistence above random baseline in Cogito across 240 multi-turn conversations
- Qualitative example of a highly emotional SAE feature with intense negative valence in Kimi self-steering
- Highest-rated emotional SAE feature; self-report describes overwhelming positive emotional valence
Claims (3)
claim
- Characterizes the temporal dynamics of emotion feature activation in LLMs
- Authors' caveat that conversational context persistence rather than internal emotion state persistence could explain findings
- Claims that agentic self-evaluation provides independent convergent evidence for emotion-persistence link
Questions (1)
question
- Question raised by Anthropic and partially addressed by this paper's persistence evidence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The central research question motivating the paper
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.854Open hypothesis from the Anthropic paper that motivates this work
- Core open question the paper raises but does not fully resolve
- Proposed explanation for why emotion probes are more persistent than variance-matched random probes
- Interpretive hypothesis offered to explain why emotion features are more persistent
- Novel finding that agentic self-evaluation of emotionality correlates with feature persistence
- Main conclusion about the temporal dynamics of emotion features