claim
active
claim:two-layer-attention-only-transformers-implement-much-more-complex-algorithms-via-composition-of-attention-heads-detectable-directly-from-weightsTwo-layer attention-only transformers implement much more complex algorithms via composition of attention heads, detectable directly from weights
Core claim for two-layer models; composition creates qualitatively more powerful in-context learning
Neighborhood — ranked by edge-count
Findings (1)
finding
- Strong test of the induction head hypothesis using uniformly sampled random tokens repeated three times
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The paper explicitly asks and addresses this question, concluding the answer depends on what 'fully understand' means
- The primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
- Core claim for one-layer models; the skip-trigram tables can be accessed without running the model
- Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found
- The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
- Result from term importance analysis breaking down loss contribution by layer
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.782VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.