claim

active

claim:each-attention-head-has-two-largely-independent-computations-a-qk-circuit-computing-the-attention-pattern-and-an-ov-circuit-computing-the-effect-if-attended-to

Each attention head has two largely independent computations: a QK circuit computing the attention pattern and an OV circuit computing the effect if attended to

Key decomposition enabling separate analysis of where attention goes and what it does

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Claims (3)

claim

Naive interpretation of attention patterns can be both informative and fundamentally misleading when Q-, K-, or V-composition is present
supports
Response to the 'attention as explanation' critique; the paper provides a typology of when attention is and isn't directly interpretable
Attention heads can be understood as independent operations each adding their output to the residual stream, equivalent to the concatenate-and-multiply formulation
supports
Mathematical equivalence enabling independent analysis of each attention head
Key, query, and value vectors are intermediary byproducts; W_OV and W_QK are the fundamental low-rank matrices describing attention head behavior
supports
Reframing observation: the canonical K/Q/V decomposition is computationally convenient but not the most interpretable representation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Attention algorithms are usually distributed across attention headsclaim0.803
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Introspection relies on general-purpose computational mechanisms—attention-based anomaly detection and residual stream dynamics—rather than specialized introspection circuitsclaim0.785
Interpretive claim about the mechanistic substrate of introspection in LLMs
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.784
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.779
Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
Concordance heads (QK circuits) could serve as the consistency-checking circuit for distinguishing intended vs. unintended outputshypothesis0.778
Speculated mechanism for prefill detection.
How can mechanistic interpretability methods automatically identify attention computations that span multiple attention heads?question0.775
Long-standing bottleneck in mechanistic interpretability that VPD addresses by working natively on attention weight matrices.
One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patternsfinding0.775
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct pathfinding0.774
Result from term importance analysis breaking down loss contribution by layer