Anti-Persistence of Emotion Features

The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature

Neighborhood — ranked by edge-count

Questions (1)

question

Why does activation of an emotion feature sometimes lead to its later suppression?
associated_with
Open mechanistic question arising from the causal steering experiment

Concepts (1)

concept

emotion feature persistence
related_to
The phenomenon that emotion feature activations remain elevated above baseline beyond local token bursts, measurable as long-range correlation

Findings (1)

finding

62% of emotions significantly elevated at 5 tokens after steering pulse ends
supports
Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

PCs of the emotion space and persistenceconcept0.783
Analysis showing that lower-rank (more central) PCs of emotion feature activations are more persistent than higher-rank (noisier) PCs
Emotional State Persistenceconcept0.775
The property of emotion features maintaining elevated activation well beyond the local token context that triggered them
If persistence is genuinely related to emotion features, lower PCs of the emotion space (more central, less noisy) should be more persistent; if it is an artifact, noisier PCs should have similar persistence.hypothesis0.773
Falsifiability test built into the PC analysis design
Mechanism by which activation of an emotion feature sometimes leads to later suppression of that same featurequestion0.767
Identified research gap: the paper observes anti-persistence but has no explanation for it
To what extent is emotion feature persistence driven by genuine internal emotional state versus autoregressive conversational context dynamics?question0.764
Core open question the paper raises but does not fully resolve
autoregressive persistenceconcept0.761
Baseline persistence of any probe direction arising from the autoregressive nature of LLMs, not specific to emotion content
Self-evaluated emotionality and textual evaluation of SAE features predict persistence in opposite directions.claim0.759
Surprising finding that the two evaluation methods diverge in their relationship with persistence
steerable emotion featuresconcept0.759
Emotion-encoding directions in LLM activation space that can be amplified or suppressed via activation steering to causally drive model behavior