Vector Product Decomposition for neural interpretability

Bottom-up mechanistic interpretability method avoiding feature splitting limitations of sparse autoencoders, applicable across architectures.

4 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Future interpretability techniques will fundamentally resemble VPDPrediction/hypothesis about the direction of the field.
VPD can be arbitrarily applied to any neural network architectureClaim of generality, highlighted as a key strength.
VPD is a meaningful step toward bottom-up interpretabilityPositioning of VPD as advancing the paradigm of explaining computation in the model's terms.
VPD subcomponents avoid feature splitting, improving interpretability over SAE approachCore interpretative claim that VPD's parameter-based decomposition prevents the feature fragmentation seen in activation-based methods.