method
active
method:all-token-steeringAll-token steering
Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
Neighborhood — ranked by edge-count
Methods (1)
method
- Stepwise steeringextendsNovel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Applies a 5-token steering pulse to each emotion probe and measures persistence of causal effect via contrast z-score over 200 subsequent tokens
- Causal intervention: applying a 5-token steering pulse at the start of a model turn to measure downstream persistence of emotion feature activation
- Comparative claim between the two steering strategies
- Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
- Parent concept; the practice of controlling neural network outputs by manipulating internal representations.
- The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
- General technique of modifying activations to control model behavior.
- Paradigm of finding the right direction in activation space (e.g., linear steering).