concept
active
concept:one-layer-attention-only-transformerOne-Layer Attention-Only Transformer
The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
Neighborhood — ranked by edge-count
Concepts (3)
concept
- Attention-Only Transformerrelated_toA simplified transformer variant without MLP layers, used as the primary subject of mechanistic analysis in this paper
- Two-Layer Attention-Only Transformerrelated_toThe primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
- Skip-TrigramimplementsA three-token pattern of the form [source]...[destination][out] that one-layer attention heads implement; the paper's key characterization of one-layer transformer behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The paper explicitly asks and addresses this question, concluding the answer depends on what 'fully understand' means
- Core claim for two-layer models; composition creates qualitatively more powerful in-context learning
- Core claim for one-layer models; the skip-trigram tables can be accessed without running the model
- Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found
- A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
- Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension
- Interpretive claim from attention head attribution analysis in appendix
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying