claim

active

claim:vpd-can-be-arbitrarily-applied-to-any-neural-network-architecture

VPD can be arbitrarily applied to any neural network architecture

Claim of generality, highlighted as a key strength.

Source paper

extracted_from

Neighborhood — ranked by edge-count

Communities (4)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic interpretability via parameter decomposition
members_of
Tracing information flow through weight matrices and attention heads using attribution graphs to identify causally important subcomponents in language models.
Vector Product Decomposition for neural interpretability
members_of
Bottom-up mechanistic interpretability method avoiding feature splitting limitations of sparse autoencoders, applicable across architectures.
Virtually Planned Decomposition interpretability
members_of
VPD as a bottom-up method for identifying real computational structure in neural networks

Methods (1)

method

Adversarial Parameter Decomposition (VPD)
associated_with
Core technique introduced in this paper for decomposing neural network weight matrices into mechanistically simple, interpretable rank-one subcomponents.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

VPD identifies real, computational structure in neural network parametersclaim0.805
Central claim that VPD successfully uncovers genuine mechanisms.
VPD enables manual model editing through direct parameter manipulation.claim0.759
Applied capability claim: VPD enables surgical changes to model behaviour at the parameter level.
The ability to make precise edits demonstrates that VPD identifies real computational machineryclaim0.755
Claim that editing success validates VPD's decomposition.
VPD decomposes parameters, not activations, flipping the standard SAE / activation-patching paradigm.claim0.738
Core proposition of the paper: a substrate-level critique of existing interpretability methods.
VPD subcomponents are sparse, interpretable, and avoid feature splitting.claim0.737
Assertion about the qualitative advantages of VPD's rank-one decomposition.
Ultimately, we would like to understand neural networks well enough to be able to intentionally design them.quote0.737
Vision statement in the conclusion.
Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.quote0.737
The paper's central thesis statement, presented prominently after the abstract
VPD achieves a better sparsity-reconstruction tradeoff than transcoders.claim0.732
Quantitative advantage claimed for VPD over a prior activation-decomposition method.