attention head localization analysis

Analysis measuring whether each attention head's maximum attention increase points to the correct injected sentence

Neighborhood — ranked by edge-count

paper

concept

attention-based signal routing
implements
Mechanism by which attention heads detect injected perturbations and route information about them to the final token position

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Attention headsconcept0.839
Transformer attention heads that could be recruited to extract different kinds of information (text vs. thoughts).
attention computationconcept0.776
Process using Q, K, V to compute a heat map over K and weighted sum of V.
Virtual Attention Headconcept0.770
The composition of two attention heads via V-composition, forming a new entity with its own attention pattern A^h2 * A^h1 and OV matrix W_OV^h2 * W_OV^h1
Attention algorithms are usually distributed across attention headsclaim0.756
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Attention heads can be understood as independent operations each adding their output to the residual stream, equivalent to the concatenate-and-multiply formulationclaim0.756
Mathematical equivalence enabling independent analysis of each attention head
Talking Heads Attentionconcept0.745
A transformer variant where OV and QK matrices of different attention heads can share components, enabling shared copying mechanisms
All 32 attention heads at layer 3 achieve 100% localization accuracy for injections at layer 2 (5-way classification, 20% chance)finding0.743
Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
Attention Schemaconcept0.741
A predictive model representing and controlling attention; central to attention schema theory.