In-Context Learning and Induction Heads (forthcoming paper)

A follow-up paper extending the framework and induction head concept to larger more realistic models

Neighborhood — ranked by edge-count

Papers (2)

paper

Hypotheses (1)

hypothesis

The mathematical framework and induction head concept will remain at least partially relevant for larger, more realistic models
associated_with
Central motivating hypothesis for the forthcoming paper on in-context learning and induction heads

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Induction heads explain in-context learning in small models and only develop in models with at least two attention layersclaim0.873
Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
In-Context Learning of Representations (Park et al. 2025)framework0.812
Reports phase-like breakpoints and geometry changes as context scales; UCCT provides measurable predictor
Large models form many induction heads built from K-composition with a previous token head, making induction heads a central driver of in-context learning at all scalesclaim0.799
Forward-looking claim connecting toy model findings to large-scale language models
Induction heads in two-layer models successfully perform in-context learning on completely random repeated token sequences far outside training distributionfinding0.792
Strong test of the induction head hypothesis using uniformly sampled random tokens repeated three times
in-context learning (ICL)concept0.776
Test-time adaptation from prompt or retrieved context with no parameter updates.
Induction Headsconcept0.772
Mechanistic circuits in transformers documented by Olsson et al. 2022, cited as evidence for pattern-repository assumption
In-Context Learning as Optimizationconcept0.771
Induction heads work by using K-composition with a previous token head to shift keys by one token, then matching the current destination token against shifted keys to predict what followsclaim0.758
The mechanistic explanation of how induction heads are implemented in two-layer models