Causal parameter decomposition in neural networks

Isolating interpretable, role-specific model subcomponents through causal analysis and targeted edits to understand mechanistic function.

4 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

A good parameter subcomponent is causally important only for specific roles and can be removed from the model without hurting performance on irrelevant promptsDefinitional principle guiding VPD: subcomponents should encode narrow, targeted computational roles rather than distributed, multi-purpose functionality.
Parameter subcomponents cleanly isolate true mechanisms of the modelInterpretive claim that the subcomponents correspond to real functional units.

Attention computations distribute across heads via parameter subcomponents with interpretable rolesMechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
The VPD-based edit has similarly low off-target effects as uninterpretable fine-tuning methodsPerformance comparison showing subcomponent editing is comparable to fine-tuning in preserving off-target behavior.