hypothesis

active

hypothesis:we-hypothesize-that-emotion-states-are-more-persistent-because-they-correspond-to-genuinely-stateful-internal-representations-not-merely-local-surface-content

We hypothesize that emotion states are more persistent because they correspond to genuinely stateful internal representations, not merely local surface content

Proposed explanation for why emotion probes are more persistent than variance-matched random probes

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Findings (1)

finding

Cogito emotion probe residual autocorrelation +0.077 above variance-matched controls (p=1.5e-27, 157/171 probes positive)
associated_with
Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone

Claims (1)

claim

Emotion may refer to a state, and more stateful concepts in general tend to be more persistent across tokens than non-stateful ones
extends
Proposed mechanistic explanation for why emotion features are more persistent

Questions (1)

question

To what extent is emotion feature persistence driven by genuine internal emotional state versus autoregressive conversational context dynamics?
gates
Core open question the paper raises but does not fully resolve

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Emotion refers to a state concept, so stateful representations in general may be more persistent across tokens.claim0.880
Interpretive hypothesis offered to explain why emotion features are more persistent
Emotion features in LLMs are genuinely more persistent than variance-matched random features, indicating stateful emotional encoding beyond autoregressive dynamicsclaim0.835
Central interpretive claim of the paper supported by multiple convergent analyses
We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.827
Open hypothesis from the Anthropic paper that motivates this work
Emotion probes are more persistent than variance-matched random probes, indicating emotion-specific persistence beyond autoregressive dynamics.claim0.818
Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
If persistence is genuinely related to emotion features, lower PCs of the emotion space (more central, less noisy) should be more persistent; if it is an artifact, noisier PCs should have similar persistence.hypothesis0.809
Falsifiability test built into the PC analysis design
To what extent is there persistence of emotional state beyond what is expected merely from the autoregressive nature of LLMs?question0.804
The central research question motivating the paper
Are LLM emotion states encoded only selectively in token positions where they are operative, or in a more complex non-linear manner?question0.801
Question raised by Anthropic and partially addressed by this paper's persistence evidence
Whether observed persistence reflects a genuine lingering emotion-like state or merely persistent conversational context that produced the emotion-relevant activationquestion0.793
Core unresolved confound the paper acknowledges but cannot rule out