artifact
active
artifact:github-com-tim-hua-01-steering-eval-awareness-publicgithub.com/tim-hua-01/steering-eval-awareness-public
Open-sourced code for all steering and evaluation experiments in the paper.
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Core technique: takes mean difference of model activations on contrastive prompts and adds the resulting vector to the residual stream at inference time.