community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c1-c2Causal parameter decomposition in neural networks
Isolating interpretable, role-specific model subcomponents through causal analysis and targeted edits to understand mechanistic function.
4 members. Each node is clickable.
Loading graph…
Drawn from 2 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (4)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (2)
- A good parameter subcomponent is causally important only for specific roles and can be removed from the model without hurting performance on irrelevant promptsDefinitional principle guiding VPD: subcomponents should encode narrow, targeted computational roles rather than distributed, multi-purpose functionality.
- Parameter subcomponents cleanly isolate true mechanisms of the modelInterpretive claim that the subcomponents correspond to real functional units.
Findings (2)
- Attention computations distribute across heads via parameter subcomponents with interpretable rolesMechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
- The VPD-based edit has similarly low off-target effects as uninterpretable fine-tuning methodsPerformance comparison showing subcomponent editing is comparable to fine-tuning in preserving off-target behavior.