method
active
method:path-patching

Path Patching

Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.
  • The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis
  • Patch-Closureconcept0.778
    A set property meaning all coordinate patches of its elements remain within the set; proved equivalent to axis-aligned hyperrectangles
  • Path Integrationconcept0.776
    Neural mechanism for tracking location through accumulation of self-movement vectors; shown to play the role of position encodings in TEM.
  • care pathconcept0.768
    A trajectory through time and space that an agent's care follows, potentially becoming infinite via the bodhisattva vow.
  • The mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term
  • The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
  • Patchscopesframework0.764
    Unifying framework for inspecting hidden representations of language models via representation interventions