method
active
method:residual-stream-activation-patchingResidual Stream Activation Patching
Used to localize causally implicated hidden states by swapping activations between true and false inputs
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Residual Stream Patchingrelated_toTechnique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The intermediate representations in transformer layers whose activations are patched and probed for truth information
- The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
- Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- Core activation intervention: add scaled vector to residual stream at layer l during completion
- Tracks cosine similarity, norm ratio, and injection direction projection across layers to measure recovery from perturbation
- The finite dimensional capacity of the residual stream for storing and communicating information between layers; conceptualized as being under high demand
- Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
- The network's tendency to actively attenuate injected perturbations over subsequent layers, erasing the signal before output