claim
active
claim:one-layer-attention-only-transformers-are-an-ensemble-of-bigram-and-skip-trigram-models-whose-parameters-can-be-read-directly-from-weights

One-layer attention-only transformers are an ensemble of bigram and skip-trigram models whose parameters can be read directly from weights

Core claim for one-layer models; the skip-trigram tables can be accessed without running the model

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Findings (1)

finding

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.