artifact
active
artifact:github-com-tim-hua-01-steering-eval-awareness-public

github.com/tim-hua-01/steering-eval-awareness-public

Open-sourced code for all steering and evaluation experiments in the paper.

Neighborhood — ranked by edge-count

Methods (1)

method
  • Core technique: takes mean difference of model activations on contrastive prompts and adds the resulting vector to the residual stream at inference time.