claim
active
claim:large-models-form-many-induction-heads-built-from-k-composition-with-a-previous-token-head-making-induction-heads-a-central-driver-of-in-context-learning-at-all-scales

Large models form many induction heads built from K-composition with a previous token head, making induction heads a central driver of in-context learning at all scales

Forward-looking claim connecting toy model findings to large-scale language models

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.