All-token steering

Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths

Neighborhood — ranked by edge-count

method

Stepwise steering
extends
Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

5-Token Steering Pulse Experimentmethod0.800
Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
5-token steering pulsemethod0.792
Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
Stepwise steering preserves accuracy while reducing cost, whereas all-token steering causes significant degradation at large intervention strengthsclaim0.785
Comparative claim between the two steering strategies
Activation Steeringmethod0.772
Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
Representation Steeringconcept0.766
Parent concept; the practice of controlling neural network outputs by manipulating internal representations.
Endogenous Steering Resistanceconcept0.766
The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
steering (intervention on internals)concept0.764
General technique of modifying activations to control model behavior.
direction-based steeringconcept0.759
Paradigm of finding the right direction in activation space (e.g., linear steering).