question
active
question:are-llm-emotion-states-encoded-only-selectively-in-token-positions-where-they-are-operative-or-in-a-more-complex-non-linear-mannerAre LLM emotion states encoded only selectively in token positions where they are operative, or in a more complex non-linear manner?
Question raised by Anthropic and partially addressed by this paper's persistence evidence
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central interpretive claim of the paper supported by multiple convergent analyses
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.839Open hypothesis from the Anthropic paper that motivates this work
- The central research question motivating the paper
- Interpretive hypothesis offered to explain why emotion features are more persistent
- Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
- Proposed explanation for why emotion probes are more persistent than variance-matched random probes
- Proposed mechanistic explanation for why emotion features are more persistent
- Characterizes the temporal dynamics of emotion feature activation in LLMs
- Main conclusion about the temporal dynamics of emotion features