question
active
question:mechanism-by-which-activation-of-an-emotion-feature-sometimes-leads-to-later-suppression-of-that-same-featureMechanism by which activation of an emotion feature sometimes leads to later suppression of that same feature
Identified research gap: the paper observes anti-persistence but has no explanation for it
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Papers (1)
paper
- Persistence and Introspection of Emotion Featuresassociated_with
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open mechanistic question arising from the causal steering experiment
- Proposed mechanistic explanation for why emotion features are more persistent
- PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
- Main conclusion about the temporal dynamics of emotion features
- The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature
- Interpretive hypothesis offered to explain why emotion features are more persistent
- Characterizes the temporal dynamics of emotion feature activation in LLMs
- Acknowledged alternative explanation that the paper does not rule out