finding

active

finding:induction-heads-in-two-layer-models-successfully-perform-in-context-learning-on-completely-random-repeated-token-sequences-far-outside-training-distribution

Induction heads in two-layer models successfully perform in-context learning on completely random repeated token sequences far outside training distribution

Strong test of the induction head hypothesis using uniformly sampled random tokens repeated three times

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Claims (3)

claim

Induction heads explain in-context learning in small models and only develop in models with at least two attention layers
supports
Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
Induction heads work by using K-composition with a previous token head to shift keys by one token, then matching the current destination token against shifted keys to predict what follows
associated_with
The mechanistic explanation of how induction heads are implemented in two-layer models
Two-layer attention-only transformers implement much more complex algorithms via composition of attention heads, detectable directly from weights
supports
Core claim for two-layer models; composition creates qualitatively more powerful in-context learning

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Large models form many induction heads built from K-composition with a previous token head, making induction heads a central driver of in-context learning at all scalesclaim0.833
Forward-looking claim connecting toy model findings to large-scale language models
In-Context Learning and Induction Heads (forthcoming paper)concept0.792
A follow-up paper extending the framework and induction head concept to larger more realistic models
All induction heads in the two-layer model occupy an extreme corner of high positive QK and OV eigenvalue positivity space relative to non-induction headsfinding0.758
Quantitative verification of the mechanistic theory; both circuits required for the induction algorithm show the predicted copying/matching structure
All 32 attention heads at layer 3 achieve 100% localization accuracy for injections at layer 2 (5-way classification, 20% chance)finding0.758
Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
Dictionary learning on model with randomly shuffled weights produces mainly single-token and poorly interpretable featuresfinding0.757
Controls for dataset structure, showing trained model activations have richer structure than data distribution alone
One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patternsfinding0.749
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
Independently trained model families converge on a common semantic manifold under self-referential processing, suggesting an attractor dynamic that transcends training variancehypothesis0.746
Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary
Token-level supervision enables models to learn functional-token invocation from reasoning contextclaim0.742
ATLAS author's assertion that functional tokens optimized via standard cross-entropy loss learn when and how to invoke operations from surrounding text.