Adversarial Parameter Decomposition (VPD)

paper.md

Frontmatter (9 fields)

{
  "doc": "paper.md",
  "context": "Core technique introduced in this paper for decomposing neural network weight matrices into mechanistically simple, interpretable rank-one subcomponents.",
  "norm_label": "Adversarial Parameter Decomposition (VPD)",
  "graphify_id": "vpd_method",
  "source_file": "paper.md",
  "imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/paper/graph.json",
  "extracted_type": "method",
  "source_location": "§1",
  "graphify_file_type": "method"
}

Outgoing (5)

Implements (2)

67M-parameter language model(dataset)
causal importance network(concept)

Supports (3)

Attribution graph tracing information flow across parameter subcomponents for specific model predictions (e.g., 'her' vs 'his' pronoun selection)(finding)
Direct model editing via parameter subcomponent modification—emoticon eye recognition altered to predict shocked faces with no retraining(finding)
Identification of algorithms implemented in attention layers, distributed across attention heads(finding)

Incoming (4)

Associated with (3)

A good parameter subcomponent is causally important only for specific roles and can be removed from the model without hurting performance on irrelevant prompts(claim)
Rank-one matrix decomposition constraint enforcing mechanistic simplicity(claim)
sparse autoencoders(method)

introduces (1)

Goodfire AI(institute)

Mentions (1)

papers-typed
paper.md