claim
active
claim:each-attention-head-has-two-largely-independent-computations-a-qk-circuit-computing-the-attention-pattern-and-an-ov-circuit-computing-the-effect-if-attended-toEach attention head has two largely independent computations: a QK circuit computing the attention pattern and an OV circuit computing the effect if attended to
Key decomposition enabling separate analysis of where attention goes and what it does
Neighborhood — ranked by edge-count
Claims (3)
claim
- Response to the 'attention as explanation' critique; the paper provides a typology of when attention is and isn't directly interpretable
- Mathematical equivalence enabling independent analysis of each attention head
- Reframing observation: the canonical K/Q/V decomposition is computationally convenient but not the most interpretable representation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Interpretive claim about the mechanistic substrate of introspection in LLMs
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.784VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.779Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
- Concordance heads (QK circuits) could serve as the consistency-checking circuit for distinguishing intended vs. unintended outputshypothesis0.778Speculated mechanism for prefill detection.
- Long-standing bottleneck in mechanistic interpretability that VPD addresses by working natively on attention weight matrices.
- Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
- Result from term importance analysis breaking down loss contribution by layer