Distributed attention head decomposition

Mechanistic interpretability approach decomposing attention heads into query/key subcomponents with distinct algorithmic roles

5 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorVPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routingVPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.
Attention computations distribute across heads via parameter subcomponents with interpretable rolesMechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
Identification of algorithms implemented in attention layers, distributed across attention headsVPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.

Attention algorithms are usually distributed across attention headsClaim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.