claim
active
claim:vpd-subcomponents-are-sparse-interpretable-and-avoid-feature-splittingVPD subcomponents are sparse, interpretable, and avoid feature splitting.
Assertion about the qualitative advantages of VPD's rank-one decomposition.
Source paper
extracted_from(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Feature splittingcitesPhenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core interpretative claim that VPD's parameter-based decomposition prevents the feature fragmentation seen in activation-based methods.
- Positioning of VPD as advancing the paradigm of explaining computation in the model's terms.
- Quantitative advantage claimed for VPD over a prior activation-decomposition method.
- Core proposition of the paper: a substrate-level critique of existing interpretability methods.
- The ability to make precise edits demonstrates that VPD identifies real computational machineryclaim0.764Claim that editing success validates VPD's decomposition.
- First question posed after applying VPD, investigating whether the subcomponents make sense.
- Empirical result demonstrating VPD's efficiency advantage in parameter decomposition.
- Prediction/hypothesis about the direction of the field.