method
active
method:5-token-steering-pulse-experiment5-Token Steering Pulse Experiment
Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
Neighborhood — ranked by edge-count
Findings (2)
finding
- Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse
- Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
Concepts (1)
concept
- The key-value cache from steered tokens is retained during no-steering continuation, allowing causal effect of steering to propagate
Methods (2)
method
- 5-token steering pulserelated_toCausal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
- Per-(emotion, token) z-score computed as injected emotion activation minus mean of 170 other probes, contrasted against no-steering baseline
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
- Acknowledged limitation of restricting experiments to 64-token completions
- Methodological contribution: used to exclude focus-1B and impulsivity-8B from scaling analysis
- Shows immediate causal effect of steering on emotion feature activation
- Key result demonstrating advantage of stepwise over all-token steering strategy
- Main empirical finding of the concept steering analysis
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
- Practical finding for optimizing steering setup.