paper:doi-10-48550-arxiv-2209-11895In-context Learning and Induction Heads
Original abstract (expand)
"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- The mechanistic basis of data dependence and abrupt learning in an in-context classification taskGautam Reddy2023≈ 80%
- Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit EmergenceHiroki Furuta, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo Gouki Minegishi2025≈ 79%
- On the Emergence of Induction Heads for In-Context LearningTiago Pimentel, Lorenzo Noci, Alessandro Stolfo, Mrinmaya Sachan, Thomas Hofmann Tiberiu Musat2026≈ 78%
- What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formationTed Moskovitz, Felix Hill, Stephanie C.Y. Chan, Andrew M. Saxe Aaditya K. Singh2024≈ 77%
- Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion ScaleKarthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth Hritik Bansal2023≈ 76%
- ≈ 76%
- Next-token pretraining implies in-context learningPaul M. Riechers and Henry R. Bigelow and Eric A. Alt and Adam Shai2025≈ 75%
- ≈ 74%
- A Mathematical Framework for Transformer Circuitsin corpus2021≈ 74%
- How Transformers Get Rich: Approximation and Dynamics AnalysisRuoxi Yu, Weinan E, Lei Wu Mingze Wang2025≈ 73%
- Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in TransformersHeejune Sheen, Tianhao Wang, Zhuoran Yang Siyu Chen2024≈ 73%
- The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and AnalysisJiazheng Li, Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He Yuxiang Zhou2024≈ 72%
- What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov ChainsMarco Bondaschi, Nived Rajaraman, Jason D. Lee, Michael Gastpar, Ashok Vardhan Makkuva, Paul Pu Liang Chanakya Ekbote2025≈ 72%
- Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context LearningAnya Belz Mohammed Sabry2026≈ 71%
- Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context LearningSubhabrata Dutta, Ahmed Elshabrawy, Harish Tayyar Madabushi, Iryna Gurevych Jingcheng Niu2025≈ 71%
- Selective Induction Heads: How Transformers Select Causal Structures In ContextFrancesco Croce, Nicolas Flammarion Francesco D'Angelo2025≈ 71%
- Why Learning Requires Feelingin corpus2026≈ 67%
- The World Inside Neural Networksin corpus2026≈ 66%
- Learning without neurons in physical systemsin corpus2022≈ 66%
- ≈ 66%
- ≈ 65%
- ≈ 65%
- Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studiesin corpus2023≈ 65%
- ≈ 65%
- Zoom In: An Introduction to Circuitsin corpus2020≈ 64%
- ≈ 64%
- ≈ 64%
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behaviorin corpus2026≈ 64%
- Alignment faking in large language modelsin corpus2024≈ 64%
- Relating transformers to models and neural representations of the hippocampal formationin corpus2021≈ 64%
Similar preprints — Semantic Scholar
Cited by (2)
- Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
- The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring
Semantic anchoring — the binding of a pretrained model's latent patterns to task-specific targets via external structure — predicts threshold-like performance flips with a single calibrated score S =