In-context Learning and Induction Heads

ByCatherine Olsson·Nelson Elhage·Neel Nanda·Nicholas Joseph·Nova Dassarma·T. Henighan+4 more

DOI 10.48550/arxiv.2209.11895 arXiv 2209.11895

Original abstract (expand)

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
2023
≈ 80%
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Hiroki Furuta, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo Gouki Minegishi
2025
≈ 79%
On the Emergence of Induction Heads for In-Context Learning
Tiago Pimentel, Lorenzo Noci, Alessandro Stolfo, Mrinmaya Sachan, Thomas Hofmann Tiberiu Musat
2026
≈ 78%
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Ted Moskovitz, Felix Hill, Stephanie C.Y. Chan, Andrew M. Saxe Aaditya K. Singh
2024
≈ 77%
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth Hritik Bansal
2023
≈ 76%
Rethinking Associative Memory Mechanism in Induction Head
Issei Sato Shuo Wang
2025
≈ 76%
Next-token pretraining implies in-context learning
Paul M. Riechers and Henry R. Bigelow and Eric A. Alt and Adam Shai
2025
≈ 75%
The Dual-Route Model of Induction
Eric Todd, Byron Wallace, David Bau Sheridan Feucht
2025
≈ 74%
A Mathematical Framework for Transformer Circuits
in corpus
2021
≈ 74%
How Transformers Get Rich: Approximation and Dynamics Analysis
Ruoxi Yu, Weinan E, Lei Wu Mingze Wang
2025
≈ 73%
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
Heejune Sheen, Tianhao Wang, Zhuoran Yang Siyu Chen
2024
≈ 73%
The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis
Jiazheng Li, Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He Yuxiang Zhou
2024
≈ 72%
What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains
Marco Bondaschi, Nived Rajaraman, Jason D. Lee, Michael Gastpar, Ashok Vardhan Makkuva, Paul Pu Liang Chanakya Ekbote
2025
≈ 72%
Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning
Anya Belz Mohammed Sabry
2026
≈ 71%
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Subhabrata Dutta, Ahmed Elshabrawy, Harish Tayyar Madabushi, Iryna Gurevych Jingcheng Niu
2025
≈ 71%
Selective Induction Heads: How Transformers Select Causal Structures In Context
Francesco Croce, Nicolas Flammarion Francesco D'Angelo
2025
≈ 71%
Why Learning Requires Feeling
in corpus
2026
≈ 67%
The World Inside Neural Networks
in corpus
2026
≈ 66%
Learning without neurons in physical systems
in corpus
2022
≈ 66%
Steering Along Manifolds to Control Neural Networks
in corpus
≈ 66%
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
in corpus
≈ 65%
The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
in corpus
2026
≈ 65%
Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies
in corpus
2023
≈ 65%
The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring
in corpus
2025
≈ 65%
Zoom In: An Introduction to Circuits
in corpus
2020
≈ 64%
Addressing divergent representations from causal interventions on neural networks
in corpus
2025
≈ 64%
Emergent Introspective Awareness in Large Language Models
in corpus
2026
≈ 64%
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
in corpus
2026
≈ 64%
Alignment faking in large language models
in corpus
2024
≈ 64%
Relating transformers to models and neural representations of the hippocampal formation
in corpus
2021
≈ 64%

Similar preprints — Semantic Scholar

Cited by (2)

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Distributed alignment search (DAS) resolves two blocking limitations of prior causal abstraction work—brute-force alignment search and the localist assumption that high-level variables map to disjoint
The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring
Semantic anchoring — the binding of a pretrained model's latent patterns to task-specific targets via external structure — predicts threshold-like performance flips with a single calibrated score S =