question

active

question:are-llm-emotion-states-encoded-only-selectively-in-token-positions-where-they-are-operative-or-in-a-more-complex-non-linear-manner

Are LLM emotion states encoded only selectively in token positions where they are operative, or in a more complex non-linear manner?

Question raised by Anthropic and partially addressed by this paper's persistence evidence

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Claims (1)

claim

Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamics
answered_by
Central interpretive claim of the paper supported by multiple convergent analyses

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.839
Open hypothesis from the Anthropic paper that motivates this work
To what extent is there persistence of emotional state beyond what is expected merely from the autoregressive nature of LLMs?question0.830
The central research question motivating the paper
Emotion refers to a state concept, so stateful representations in general may be more persistent across tokens.claim0.810
Interpretive hypothesis offered to explain why emotion features are more persistent
Emotion Features in LLMsconcept0.802
Internal representations encoding emotion concepts in large language models, identified by probing and SAE methods
We hypothesize that emotion states are more persistent because they correspond to genuinely stateful internal representations, not merely local surface contenthypothesis0.801
Proposed explanation for why emotion probes are more persistent than variance-matched random probes
Emotion may refer to a state, and more stateful concepts in general tend to be more persistent across tokens than non-stateful onesclaim0.789
Proposed mechanistic explanation for why emotion features are more persistent
Emotions are not strictly locally scoped but instead bursty with a long tail of slow change persisting over 100 tokensclaim0.788
Characterizes the temporal dynamics of emotion feature activation in LLMs
Emotion features are not strictly locally scoped; they are bursty with a long tail of slow change persisting over 100 tokens.claim0.787
Main conclusion about the temporal dynamics of emotion features