VPD subcomponents are sparse, interpretable, and avoid feature splitting.

Assertion about the qualitative advantages of VPD's rank-one decomposition.

Source paper

extracted_from

(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4

concept

Feature splitting
cites
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

VPD subcomponents avoid feature splitting, improving interpretability over SAE approachclaim0.885
Core interpretative claim that VPD's parameter-based decomposition prevents the feature fragmentation seen in activation-based methods.
VPD is a meaningful step toward bottom-up interpretabilityclaim0.803
Positioning of VPD as advancing the paradigm of explaining computation in the model's terms.
VPD achieves a better sparsity-reconstruction tradeoff than transcoders.claim0.790
Quantitative advantage claimed for VPD over a prior activation-decomposition method.
VPD decomposes parameters, not activations, flipping the standard SAE / activation-patching paradigm.claim0.769
Core proposition of the paper: a substrate-level critique of existing interpretability methods.
The ability to make precise edits demonstrates that VPD identifies real computational machineryclaim0.764
Claim that editing success validates VPD's decomposition.
are the resulting parameter subcomponents actually interpretable objects?question0.763
First question posed after applying VPD, investigating whether the subcomponents make sense.
VPD achieves better sparsity-reconstruction tradeoff than transcoders on 67M modelfinding0.762
Empirical result demonstrating VPD's efficiency advantage in parameter decomposition.
Future interpretability techniques will fundamentally resemble VPDclaim0.757
Prediction/hypothesis about the direction of the field.