concept
active
concept:three-operational-regimes-of-steeringThree Operational Regimes of Steering
The three categories of SAE feature behavior under concept steering identified in the paper
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Concept SteeringintroducesLatent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Main empirical finding of the concept steering analysis
- Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
- The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
- General technique of modifying activations to control model behavior.
- Paradigm of finding the right direction in activation space (e.g., linear steering).
- The method can steer the model in both positive and negative directions on the target semantic.
- Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time