claim
active
claim:one-layer-attention-only-transformers-are-an-ensemble-of-bigram-and-skip-trigram-models-whose-parameters-can-be-read-directly-from-weightsOne-layer attention-only transformers are an ensemble of bigram and skip-trigram models whose parameters can be read directly from weights
Core claim for one-layer models; the skip-trigram tables can be accessed without running the model
Neighborhood — ranked by edge-count
Findings (1)
finding
- Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
Questions (1)
question
- The paper explicitly asks and addresses this question, concluding the answer depends on what 'fully understand' means
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core claim for two-layer models; composition creates qualitatively more powerful in-context learning
- The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
- The primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
- Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found
- Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension
- Early example of using mechanistic interpretability to understand unintended model behavior
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.