finding

active

finding:all-induction-heads-in-the-two-layer-model-occupy-an-extreme-corner-of-high-positive-qk-and-ov-eigenvalue-positivity-space-relative-to-non-induction-heads

All induction heads in the two-layer model occupy an extreme corner of high positive QK and OV eigenvalue positivity space relative to non-induction heads

Quantitative verification of the mechanistic theory; both circuits required for the induction algorithm show the predicted copying/matching structure

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Claims (2)

claim

Induction heads work by using K-composition with a previous token head to shift keys by one token, then matching the current destination token against shifted keys to predict what follows
associated_withsupports
The mechanistic explanation of how induction heads are implemented in two-layer models
Induction heads explain in-context learning in small models and only develop in models with at least two attention layers
supports
Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

All induction heads fall in an extreme corner of high OV eigenvalue positivity and high QK eigenvalue positivity, confirming the mechanistic theoryclaim0.861
Quantitative verification that the copying and matching structure predicted by the mechanistic theory is present in all observed induction heads
10 out of 12 attention heads in the 12-head one-layer model show significantly positive eigenvalue sums, indicating copying behaviorfinding0.771
Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
Attention heads with positive projection on reflection direction are sparse and located mostly in deeper layers of DeepSeek-R1-Qwen-1.5Bfinding0.766
Structural finding about which attention heads control reflection behavior
In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct pathfinding0.764
Result from term importance analysis breaking down loss contribution by layer
In the analyzed two-layer attention-only model, only K-composition is significant; V- and Q-composition are negligible by Frobenius norm measurefinding0.762
Result from applying the Frobenius norm composition measurement to all attention head pairs in the two-layer model
Induction heads in two-layer models successfully perform in-context learning on completely random repeated token sequences far outside training distributionfinding0.758
Strong test of the induction head hypothesis using uniformly sampled random tokens repeated three times
2D projections of activations show clearly separable clusters for F0-F2 and A1 at layer 25, but increasingly entangled activations for F4-F5 and A2-A3.finding0.751
Visual geometric evidence for the fundamental entanglement of true/false activations in harder tasks.
Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.751
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying