claim
active
claim:vpd-enables-manual-model-editing-through-direct-parameter-manipulationVPD enables manual model editing through direct parameter manipulation.
Applied capability claim: VPD enables surgical changes to model behaviour at the parameter level.
Source paper
extracted_from(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Manual model editingcitesAbility to surgically alter model behavior through direct parameter changes rather than activation interventions.
- Direct editing of model parameters, enabled by VPD's decomposition, for manual model editing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability to make precise edits demonstrates that VPD identifies real computational machineryclaim0.846Claim that editing success validates VPD's decomposition.
- Application enabled by VPD: direct manipulation of weight matrices for interpretable model modification.
- Core proposition of the paper: a substrate-level critique of existing interpretability methods.
- Empirical demonstration of VPD on a mid-scale transformer, establishing feasibility.
- The VPD-based edit has similarly low off-target effects as uninterpretable fine-tuning methodsfinding0.781Performance comparison showing subcomponent editing is comparable to fine-tuning in preserving off-target behavior.
- Core methodological framework introduced in this paper; decomposes weight matrices into rank-one interpretable subcomponents using adversarial ablations.
- Positioning of VPD as advancing the paradigm of explaining computation in the model's terms.
- Claim of generality, highlighted as a key strength.