method
active
method:5-token-steering-pulse5-token steering pulse
Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The causal steering experiment persists KV state over steered tokens so downstream effects can be observed without continued steering
Methods (1)
method
- 5-Token Steering Pulse Experimentrelated_toApplies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates that the majority of emotion features show persistent upregulation shortly after a steering pulse
- Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
- Shows that causal steering effects persist over long ranges for a substantial fraction of emotion probes
- Shows immediate causal effect of steering on emotion feature activation
- Acknowledged limitation of restricting experiments to 64-token completions
- Demonstrates distributed steering is more effective and less accuracy-damaging than concentrated steering.
- Practical finding for optimizing steering setup.
- Modifying model behavior by clamping SAE feature activations to specific values during forward pass.