claim
active
claim:naive-interpretation-of-attention-patterns-can-be-both-informative-and-fundamentally-misleading-when-q-k-or-v-composition-is-present

Naive interpretation of attention patterns can be both informative and fundamentally misleading when Q-, K-, or V-composition is present

Response to the 'attention as explanation' critique; the paper provides a typology of when attention is and isn't directly interpretable

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.