hypothesis
active
hypothesis:we-hypothesize-that-emotion-states-are-more-persistent-because-they-correspond-to-genuinely-stateful-internal-representations-not-merely-local-surface-contentWe hypothesize that emotion states are more persistent because they correspond to genuinely stateful internal representations, not merely local surface content
Proposed explanation for why emotion probes are more persistent than variance-matched random probes
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Findings (1)
finding
- Cogito emotion probe residual autocorrelation +0.077 above variance-matched controls (p=1.5e-27, 157/171 probes positive)associated_withDemonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
Claims (1)
claim
- Proposed mechanistic explanation for why emotion features are more persistent
Questions (1)
question
- Core open question the paper raises but does not fully resolve
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interpretive hypothesis offered to explain why emotion features are more persistent
- Central interpretive claim of the paper supported by multiple convergent analyses
- We hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.hypothesis0.827Open hypothesis from the Anthropic paper that motivates this work
- Core empirical claim distinguishing emotion persistence from generic high-variance probe persistence
- Falsifiability test built into the PC analysis design
- The central research question motivating the paper
- Question raised by Anthropic and partially addressed by this paper's persistence evidence
- Core unresolved confound the paper acknowledges but cannot rule out