concept
active
concept:induction-headsInduction Heads
Mechanistic circuits in transformers documented by Olsson et al. 2022, cited as evidence for pattern-repository assumption
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Chris OlahintroducesCo-author; provided high-level research guidance, wrote introduction/discussion.
Concepts (6)
concept
- in-context learning (ICL)associated_withimplementsTest-time adaptation from prompt or retrieved context with no parameter updates.
- Previous Token Headassociated_withimplementsAn attention head that primarily attends to the immediately preceding token; key building block for induction heads via K-composition
- Skip-TrigramextendsA three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
- K-CompositionimplementsA form of attention head composition where W_K reads from a subspace affected by a previous head; central to how induction heads are implemented
- Unlabeled statistical regularities stored during pretraining.
- Two-Layer Attention-Only TransformerimplementsThe primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A follow-up paper extending the framework and induction head concept to larger more realistic models
- Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
- The mechanistic explanation of how induction heads are implemented in two-layer models
- Forward-looking claim connecting toy model findings to large-scale language models
- Emerging framework that seeks invariants between evolution and learning; cited as future direction.
- Transformer attention heads that could be recruited to extract different kinds of information (text vs. thoughts).
- QK circuit heads hypothesized to measure likelihood of an output given prior activations, used in prefill detection.