claim
active
claim:each-attention-head-has-two-largely-independent-computations-a-qk-circuit-computing-the-attention-pattern-and-an-ov-circuit-computing-the-effect-if-attended-to

Each attention head has two largely independent computations: a QK circuit computing the attention pattern and an OV circuit computing the effect if attended to

Key decomposition enabling separate analysis of where attention goes and what it does

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Claims (3)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.