claim
active
claim:large-models-form-many-induction-heads-built-from-k-composition-with-a-previous-token-head-making-induction-heads-a-central-driver-of-in-context-learning-at-all-scalesLarge models form many induction heads built from K-composition with a previous token head, making induction heads a central driver of in-context learning at all scales
Forward-looking claim connecting toy model findings to large-scale language models
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
- The mechanistic explanation of how induction heads are implemented in two-layer models
- Strong test of the induction head hypothesis using uniformly sampled random tokens repeated three times
- A follow-up paper extending the framework and induction head concept to larger more realistic models
- GPT-2 implements at least one induction head using pointer arithmetic on positional embeddings rather than K-compositionhypothesis0.781Observation of an alternative induction head implementation algorithm in larger models with positional embeddings in the residual stream
- The mathematical framework and induction head concept will remain at least partially relevant for larger, more realistic modelshypothesis0.777Central motivating hypothesis for the forthcoming paper on in-context learning and induction heads
- The Primer architecture's depthwise convolution change would allow induction heads to form without requiring K-compositionhypothesis0.777Architectural interpretation of how Primer's design change relates to the paper's mechanistic theory of induction heads
- Quantitative verification that the copying and matching structure predicted by the mechanistic theory is present in all observed induction heads