quote
active
quote:we-revealed-the-one-layer-attention-only-model-to-be-a-compressed-chinese-room-and-we-re-left-with-a-giant-pile-of-cards

We revealed the one-layer attention-only model to be a compressed Chinese room, and we're left with a giant pile of cards.

Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.