claim
active
claim:in-small-two-layer-attention-only-transformers-the-only-significant-composition-is-k-composition-between-a-single-first-layer-head-and-some-second-layer-heads

In small two-layer attention-only transformers, the only significant composition is K-composition between a single first-layer head and some second-layer heads

Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.