community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c0-c2Distributed computation across attention heads
Studies how query, key, and value components decompose into specialized subfunctions across heads, enabling routing and token prediction behaviors.
4 members. Each node is clickable.
Loading graph…
Drawn from 2 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (2)
- Attention algorithms are usually distributed across attention headsClaim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Q/K/V values function as information routing: Q queries past, K signals future attention, V carries selectively routed information.Janus's interpretive model for how attention mechanisms enable deliberate information flow and selective routing.
Findings (2)
- A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorVPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
- A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routingVPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.