Residual Stream Activation Patching

Used to localize causally implicated hidden states by swapping activations between true and false inputs

Neighborhood — ranked by edge-count

paper

method

Residual Stream Patching
related_to
Technique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Residual Stream Activationconcept0.908
The intermediate representations in transformer layers whose activations are patched and probed for truth information
layer 40 residual-stream activationsconcept0.867
The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
Residual Streamconcept0.821
Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Residual-Stream Injectionconcept0.803
Core activation intervention: add scaled vector to residual stream at layer l during completion
residual stream recovery trackingmethod0.802
Tracks cosine similarity, norm ratio, and injection direction projection across layers to measure recovery from perturbation
Residual Stream Bandwidthconcept0.790
The finite dimensional capacity of the residual stream for storing and communicating information between layers; conceptualized as being under high demand
Activation patchingmethod0.778
Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
residual stream recovery dynamicsconcept0.776
The network's tendency to actively attenuate injected perturbations over subsequent layers, erasing the signal before output