method
active
method:path-patchingPath Patching
Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
- The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis
- A set property meaning all coordinate patches of its elements remain within the set; proved equivalent to axis-aligned hyperrectangles
- Neural mechanism for tracking location through accumulation of self-movement vectors; shown to play the role of position encodings in TEM.
- A trajectory through time and space that an agent's care follows, potentially becoming infinite via the bodhisattva vow.
- The mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term
- The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
- Unifying framework for inspecting hidden representations of language models via representation interventions