claim

active

claim:all-induction-heads-fall-in-an-extreme-corner-of-high-ov-eigenvalue-positivity-and-high-qk-eigenvalue-positivity-confirming-the-mechanistic-theory

All induction heads fall in an extreme corner of high OV eigenvalue positivity and high QK eigenvalue positivity, confirming the mechanistic theory

Quantitative verification that the copying and matching structure predicted by the mechanistic theory is present in all observed induction heads

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

All induction heads in the two-layer model occupy an extreme corner of high positive QK and OV eigenvalue positivity space relative to non-induction headsfinding0.861
Quantitative verification of the mechanistic theory; both circuits required for the induction algorithm show the predicted copying/matching structure
Large models form many induction heads built from K-composition with a previous token head, making induction heads a central driver of in-context learning at all scalesclaim0.769
Forward-looking claim connecting toy model findings to large-scale language models
Induction heads work by using K-composition with a previous token head to shift keys by one token, then matching the current destination token against shifted keys to predict what followsclaim0.768
The mechanistic explanation of how induction heads are implemented in two-layer models
Induction heads explain in-context learning in small models and only develop in models with at least two attention layersclaim0.761
Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
Attention heads with positive projection on reflection direction are sparse and located mostly in deeper layers of DeepSeek-R1-Qwen-1.5Bfinding0.753
Structural finding about which attention heads control reflection behavior
H2: The conditions necessary for the induction of deep models, familiar in connectionist models of learning and cognition, are predictive of the conditions necessary for an ETI to occur.hypothesis0.751
Second hypothesis linking learning theory directly to evolutionary transitions
The mathematical framework and induction head concept will remain at least partially relevant for larger, more realistic modelshypothesis0.747
Central motivating hypothesis for the forthcoming paper on in-context learning and induction heads
Evolutionary transitions in individuality constitute a form of deep model induction.claim0.744
Links ETIs to the learning of hierarchical representations.