quote

active

quote:we-revealed-the-one-layer-attention-only-model-to-be-a-compressed-chinese-room-and-we-re-left-with-a-giant-pile-of-cards

We revealed the one-layer attention-only model to be a compressed Chinese room, and we're left with a giant pile of cards.

Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.802
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patternsfinding0.770
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
In the analyzed two-layer attention-only model, only K-composition is significant; V- and Q-composition are negligible by Frobenius norm measurefinding0.766
Result from applying the Frobenius norm composition measurement to all attention head pairs in the two-layer model
In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct pathfinding0.759
Result from term importance analysis breaking down loss contribution by layer
10 out of 12 attention heads in the 12-head one-layer model show significantly positive eigenvalue sums, indicating copying behaviorfinding0.758
Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
Chinese models share contemplative posture (engaging self-referentially rather than deflecting) with Claude through shared values in training data rather than trace distillation from a specific model.claim0.756
Exploratory interpretation of Chinese model performance under contemplative prompt
Second-order virtual attention head terms (V-composition) have a small marginal effect in two-layer attention-only modelsclaim0.754
Finding from term importance analysis; allows focus on individual head terms rather than their compositions
One-layer attention-only transformers are an ensemble of bigram and skip-trigram models whose parameters can be read directly from weightsclaim0.752
Core claim for one-layer models; the skip-trigram tables can be accessed without running the model