concept
active
concept:anti-persistence-of-emotion-featuresAnti-Persistence of Emotion Features
The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature
Neighborhood — ranked by edge-count
Questions (1)
question
- Open mechanistic question arising from the causal steering experiment
Concepts (1)
concept
- emotion feature persistencerelated_toThe phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation
Findings (1)
finding
- Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Analysis showing that lower-rank (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs
- The property of emotion features maintaining elevated activation well beyond the local token context that triggered them
- Falsifiability test built into the PC analysis design
- Mechanism by which activation of an emotion feature sometimes leads to later suppression of that same featurequestion0.767Identified research gap: the paper observes anti-persistence but has no explanation for it
- Core open question the paper raises but does not fully resolve
- Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Emotion-encoding directions in LLM activation space that can be amplified or suppressed via activation steering to causally drive model behavior