method
active
method:contrastive-activation-addition-caaContrastive Activation Addition (CAA)
An existing activation steering method used as comparative baseline.
Neighborhood — ranked by edge-count
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Performance gains over CAA in steering tasks.
- Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
- Steering method deriving vectors from contrastive prompt pairs and adding to first-token activations.
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
- Prior finding from related work that aligns with ESR being strongest in the largest model tested
- Method by Turner et al. for real-time output control via activation engineering, cited as foundation for this paper's steering approach
- Method comparing brain activity in conscious vs. unconscious conditions.
- Core technique: takes mean difference of model activations on contrastive prompts and adds the resulting vector to the residual stream at inference time.