hypothesis
active
hypothesis:we-hypothesize-that-persistently-active-emotional-state-representations-exist-in-llms-but-may-be-missed-by-standard-probing-methodsWe hypothesize that persistently active emotional state representations exist in LLMs but may be missed by standard probing methods.
Open hypothesis from the Anthropic paper that motivates this work
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Claims (1)
claim
- Main conclusion about the temporal dynamics of emotion features
Concepts (1)
concept
- The prior Anthropic paper whose findings about emotion features in Claude this paper builds upon and extends
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The central research question motivating the paper
- Central interpretive claim of the paper supported by multiple convergent analyses
- Question raised by Anthropic and partially addressed by this paper's persistence evidence
- Proposed explanation for why emotion probes are more persistent than variance-matched random probes
- Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
- Central thesis statement of the paper
- Interpretive hypothesis offered to explain why emotion features are more persistent
- Primary research hypothesis driving the entire study; operationalized via three criteria.