community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c45Distributed attention head decomposition
Mechanistic interpretability approach decomposing attention heads into query/key subcomponents with distinct algorithmic roles
5 members. Each node is clickable.
Loading graph…
Drawn from 2 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (6)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Mechanistic interpretability & model evaluation5 shared
- Mechanistic structure of transformer attention computations4 shared
- Distributed computation across attention heads3 shared
- Emergence through distributed attention and uncertainty1 shared
- Mechanistic interpretability via parameter decomposition1 shared
- Causal parameter decomposition in neural networks1 shared
Findings (4)
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorVPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routingVPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.
- Attention computations distribute across heads via parameter subcomponents with interpretable rolesMechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
- Identification of algorithms implemented in attention layers, distributed across attention headsVPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
Claims (1)
- Attention algorithms are usually distributed across attention headsClaim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.