finding
active
finding:62-of-emotions-significantly-elevated-at-5-tokens-after-steering-pulse-ends62% of emotions significantly elevated at 5 tokens after steering pulse ends
Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse
Source paper
extracted_fromScott Sauers · Imago · Janus · Antra Tessera
Neighborhood — ranked by edge-count
Claims (1)
claim
- Characterizes the temporal dynamics of emotion feature activation in LLMs
Methods (1)
method
- 5-Token Steering Pulse ExperimentintroducesApplies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
Concepts (1)
concept
- The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature
Findings (1)
finding
- Shows immediate causal effect of steering on emotion feature activation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
- Demonstrates long-tail persistence of causal steering effect in a subset of emotion features
- Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
- Acknowledged limitation of restricting experiments to 64-token completions
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.718Shows low agreement between the two evaluation modalities
- Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
- 0% multi-attempt responses across 7,892 no-steering baseline trials confirming ESR is steering-inducedfinding0.715Control result establishing that self-correction is specifically induced by steering, not spontaneous model behavior
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.