5-Token Steering Pulse Experiment

Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens

Neighborhood — ranked by edge-count

Findings (2)

finding

62% of emotions significantly elevated at 5 tokens after steering pulse ends
introduces
Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse
48 of 171 emotion probes individually significant at token 100 post-steering
introduces
Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes

Concepts (1)

concept

KV State Persistence
uses
The key-value cache from steered tokens is retained during no-steering continuation, allowing causal effect of steering to propagate

Methods (2)

method

5-token steering pulse
related_to
Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
Causal Contrast Z-Score
uses
Per-(emotion, token) z-score computed as injected emotion activation minus mean of 170 other probes, contrasted against no-steering baseline

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

All-token steeringmethod0.800
Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
Do psychological steering results hold beyond 64-token completions?question0.744
Acknowledged limitation of restricting experiments to 64-token completions
The steering-sign test functions as a practical probe-validation criterion: inverted report changes when steering suspect probe qualityclaim0.729
Methodological contribution: used to exclude focus-1B and impulsivity-8B from scaling analysis
At 5 tokens after steering pulse ends, 130 of 171 (62%) emotion features are BH-significantly elevated; 14% are suppressed.finding0.726
Shows immediate causal effect of steering on emotion feature activation
Stepwise steering achieves over 5% accuracy improvement compared to all-token intervention at similar token budgetfinding0.723
Key result demonstrating advantage of stepwise over all-token steering strategy
Concept steering experiments identify three distinct operational regimes across clinical concepts in EEG foundation models.finding0.719
Main empirical finding of the concept steering analysis
Steering vectors discover effective triggers such as 'However' and 'Otherwise', consistent with prior reported reflection datasetsfinding0.714
Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
Distributing steering strength across multiple layers (6 layers at 0.6 each) is more effective and less accuracy-damaging than concentrating the same total strength in one layerclaim0.712
Practical finding for optimizing steering setup.