question
active
question:do-we-fully-understand-one-layer-attention-only-transformers

Do we 'fully understand' one-layer attention-only transformers?

The paper explicitly asks and addresses this question, concluding the answer depends on what 'fully understand' means

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.