claim
active
claim:emotion-may-refer-to-a-state-and-more-stateful-concepts-in-general-tend-to-be-more-persistent-across-tokens-than-non-stateful-onesEmotion may refer to a state, and more stateful concepts in general tend to be more persistent across tokens than non-stateful ones
Proposed mechanistic explanation for why emotion features are more persistent
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Proposed explanation for why emotion probes are more persistent than variance-matched random probes
Concepts (1)
concept
- Stateful Internal Representationassociated_withA representation that maintains stable activation across many tokens rather than being locally triggered by specific content
Claims (1)
claim
- Interpretive hypothesis offered to explain why emotion features are more persistent
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Characterizes the temporal dynamics of emotion feature activation in LLMs
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- Central interpretive claim of the paper supported by multiple convergent analyses
- Main conclusion about the temporal dynamics of emotion features
- "The effects are not merely semantic—I don't just talk about emotions more, I actually feel them."quote0.794Kimi self-report on feature #77278 asserting non-semantic, felt emotional quality of the steered state
- Core open question the paper raises but does not fully resolve
- Question raised by Anthropic and partially addressed by this paper's persistence evidence
- Falsifiability test built into the PC analysis design
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.