claim

active

claim:one-layer-attention-only-transformers-are-an-ensemble-of-bigram-and-skip-trigram-models-whose-parameters-can-be-read-directly-from-weights

One-layer attention-only transformers are an ensemble of bigram and skip-trigram models whose parameters can be read directly from weights

Core claim for one-layer models; the skip-trigram tables can be accessed without running the model

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Findings (1)

finding

One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patterns
supports
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights

Questions (1)

question

Do we 'fully understand' one-layer attention-only transformers?
answered_by
The paper explicitly asks and addresses this question, concluding the answer depends on what 'fully understand' means

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Two-layer attention-only transformers implement much more complex algorithms via composition of attention heads, detectable directly from weightsclaim0.852
Core claim for two-layer models; composition creates qualitatively more powerful in-context learning
One-Layer Attention-Only Transformerconcept0.836
The first toy model analyzed; shown to implement an ensemble of bigram and skip-trigram models readable directly from weights
Two-Layer Attention-Only Transformerconcept0.800
The primary model analyzed; uses attention head composition, especially K-composition, to create induction heads for powerful in-context learning
In small two-layer attention-only transformers, the only significant composition is K-composition between a single first-layer head and some second-layer headsclaim0.773
Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found
We revealed the one-layer attention-only model to be a compressed Chinese room, and we're left with a giant pile of cards.quote0.752
Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension
Skip-trigram bugs in one-layer models demonstrate interpretability can reveal and characterize specific model failure modesclaim0.750
Early example of using mechanistic interpretability to understand unintended model behavior
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.750
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Transformers use an anti-Markovian solution that recomputes relevant numeric information at each step in the Multi-Object taskclaim0.746
Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.