finding

active

finding:62-of-emotions-significantly-elevated-at-5-tokens-after-steering-pulse-ends

62% of emotions significantly elevated at 5 tokens after steering pulse ends

Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse

Source paper

extracted_from

Persistence and Introspection of Emotion Features

Scott Sauers · Imago · Janus · Antra Tessera

Neighborhood — ranked by edge-count

Claims (1)

claim

Emotions are not strictly locally scoped but instead bursty with a long tail of slow change persisting over 100 tokens
supports
Characterizes the temporal dynamics of emotion feature activation in LLMs

Methods (1)

method

5-Token Steering Pulse Experiment
introduces
Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens

Concepts (1)

concept

Anti-Persistence of Emotion Features
supports
The phenomenon where activating an emotion feature leads to subsequent below-baseline activation of that feature

Findings (1)

finding

At 5 tokens after steering pulse ends, 130 of 171 (62%) emotion features are BH-significantly elevated; 14% are suppressed.
restates
Shows immediate causal effect of steering on emotion feature activation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

48 of 171 emotion probes individually significant at token 100 post-steeringfinding0.829
Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
At 100 tokens post-steering, 48 of 171 emotion features remain individually BH-significant despite average effect being near zero.finding0.824
Demonstrates long-tail persistence of causal steering effect in a subset of emotion features
5-token steering pulsemethod0.803
Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
Do psychological steering results hold beyond 64-token completions?question0.768
Acknowledged limitation of restricting experiments to 64-token completions
Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.718
Shows low agreement between the two evaluation modalities
Cogito emotion probe residual autocorrelation +0.077 above variance-matched controls (p=1.5e-27, 157/171 probes positive)finding0.718
Demonstrates that Cogito emotion probes are persistently active beyond what is explained by their variance alone
Impulsivity→interest steering: probe entropy increases (LMM slope=0.024, p=2.30×10⁻⁴) but report entropy does not (p=0.11)finding0.715
Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
0% multi-attempt responses across 7,892 no-steering baseline trials confirming ESR is steering-inducedfinding0.715
Control result establishing that self-correction is specifically induced by steering, not spontaneous model behavior

Restated by (1)

cosine ≥ 0.90

Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.

finding
At 5 tokens after steering pulse ends, 130 of 171 (62%) emotion features are BH-significantly elevated; 14% are suppressed.