framework
active
framework:activation-addition-actaddActivation Addition (ActAdd)
Steering method deriving vectors from contrastive prompt pairs and adding to first-token activations.
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Alexander Matt TurnerintroducesLead author of Activation Engineering paper; foundational for additive steering paradigm
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method by Turner et al. for real-time output control via activation engineering, cited as foundation for this paper's steering approach
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
- Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
- An existing activation steering method used as comparative baseline.
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Changing configuration to sample environment differently; minimizes free energy.
- The generic addition mechanism that Llama-3.1-8B actually uses to compute sums before mapping back to cyclic concept space
- Latent model activations when processing inputs framed from another agent's perspective