VPD scales to a 4-layer 67M-parameter model trained on The Pile.

Empirical demonstration of VPD on a mid-scale transformer, establishing feasibility.

Source paper

extracted_from

(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4

concept

4-layer 67M-parameter model
cites
The model (trained on The Pile) on which VPD is demonstrated to scale.

dataset

The Pile
cites
Training corpus used for the 67M-parameter model tested with VPD.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

VPD enables manual model editing through direct parameter manipulation.claim0.792
Applied capability claim: VPD enables surgical changes to model behaviour at the parameter level.
VPD achieves better sparsity-reconstruction tradeoff than transcoders on 67M modelfinding0.789
Empirical result demonstrating VPD's efficiency advantage in parameter decomposition.
VPD decomposes parameters, not activations, flipping the standard SAE / activation-patching paradigm.claim0.772
Core proposition of the paper: a substrate-level critique of existing interpretability methods.
Does VPD mechanistic faithfulness and interpretability survive at frontier model scale?question0.769
Open research question about whether VPD generalizes beyond the tested 67M-parameter regime.
The ability to make precise edits demonstrates that VPD identifies real computational machineryclaim0.750
Claim that editing success validates VPD's decomposition.
VPD subcomponents are sparse, interpretable, and avoid feature splitting.claim0.749
Assertion about the qualitative advantages of VPD's rank-one decomposition.
VPD identifies real, computational structure in neural network parametersclaim0.746
Central claim that VPD successfully uncovers genuine mechanisms.
Phi-4 shows U-shaped cohesion with falling mismatch; peak depth varies by modelfinding0.741
E3 backbone-specific finding showing three-stage trajectory generalizes across architectures