framework
active
framework:activation-addition-actadd

Activation Addition (ActAdd)

Steering method deriving vectors from contrastive prompt pairs and adding to first-token activations.

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Lead author of Activation Engineering paper; foundational for additive steering paradigm

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Method by Turner et al. for real-time output control via activation engineering, cited as foundation for this paper's steering approach
  • Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
  • Adding steering vector in forward direction to push model activations toward stronger reflective behavior.
  • An existing activation steering method used as comparative baseline.
  • Activationsconcept0.777
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • actionconcept0.759
    Changing configuration to sample environment differently; minimizes free energy.
  • Base-10 additionconcept0.748
    The generic addition mechanism that Llama-3.1-8B actually uses to compute sums before mapping back to cyclic concept space
  • Latent model activations when processing inputs framed from another agent's perspective