claim
active
claim:vpd-decomposes-parameters-not-activations-flipping-the-standard-sae-activation-patching-paradigm

VPD decomposes parameters, not activations, flipping the standard SAE / activation-patching paradigm.

Core proposition of the paper: a substrate-level critique of existing interpretability methods.

Source paper

extracted_from
Interpreting Language Model Parameters
(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4

Neighborhood — ranked by edge-count

Methods (2)

method
  • Interpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.
  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.