method
pending-review
method:adversarial-parameter-decomposition-vpdAdversarial Parameter Decomposition (VPD)
paper.mdFrontmatter (9 fields)
{
"doc": "paper.md",
"context": "Core technique introduced in this paper for decomposing neural network weight matrices into mechanistically simple, interpretable rank-one subcomponents.",
"norm_label": "Adversarial Parameter Decomposition (VPD)",
"graphify_id": "vpd_method",
"source_file": "paper.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/paper/graph.json",
"extracted_type": "method",
"source_location": "§1",
"graphify_file_type": "method"
}Outgoing (5)
Implements (2)
- 67M-parameter language model(dataset)
- causal importance network(concept)
Supports (3)
- Attribution graph tracing information flow across parameter subcomponents for specific model predictions (e.g., 'her' vs 'his' pronoun selection)(finding)
- Direct model editing via parameter subcomponent modification—emoticon eye recognition altered to predict shocked faces with no retraining(finding)
- Identification of algorithms implemented in attention layers, distributed across attention heads(finding)
Mentions (1)
- papers-typed
paper.md