claim
active
claim:induction-heads-work-by-using-k-composition-with-a-previous-token-head-to-shift-keys-by-one-token-then-matching-the-current-destination-token-against-shifted-keys-to-predict-what-follows

Induction heads work by using K-composition with a previous token head to shift keys by one token, then matching the current destination token against shifted keys to predict what follows

The mechanistic explanation of how induction heads are implemented in two-layer models

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Findings (2)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.