method
active
method:steering-sign-validation-testSteering-sign validation test
Validation filter: same-concept steering must shift self-report in expected direction; used to exclude invalid concept-model pairs
Neighborhood — ranked by edge-count
Methods (1)
method
- Activation SteeringextendsCausal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
Claims (1)
claim
- Methodological contribution: used to exclude focus-1B and impulsivity-8B from scaling analysis
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mechanism claim supported by transcript analysis and the fact that the steering vector was extracted from a model that never writes type hints.
- A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
- CIMC's methodology for evaluating whether a built system is conscious: combining multiple forms of evidence including predicted functional organization and developmental trajectories
- Framework of using internal-state representations to control or steer generative models; conceptually parallel to manifold steering in language models.
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
- The paper demonstrates the bidirectional geometry-behavior relationship across multiple tasks and modalities (language models and video world models)
- Method where Kimi K2.5 steers its own SAE features in real time and reports on its internal emotional state
- Alternative to inference-time activation capping: applying persona steering during training to deeply anchor models; cited from Chen et al.