finding
active
finding:in-the-analyzed-two-layer-attention-only-model-only-k-composition-is-significant-v-and-q-composition-are-negligible-by-frobenius-norm-measureIn the analyzed two-layer attention-only model, only K-composition is significant; V- and Q-composition are negligible by Frobenius norm measure
Result from applying the Frobenius norm composition measurement to all attention head pairs in the two-layer model
Neighborhood — ranked by edge-count
Claims (1)
claim
- Empirical observation from the specific two-layer model analyzed; no significant V- or Q-composition found
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Finding from term importance analysis; allows focus on individual head terms rather than their compositions
- Response to the 'attention as explanation' critique; the paper provides a typology of when attention is and isn't directly interpretable
- Result of term importance analysis ablation experiment; justifies focusing on individual head terms
- Result from term importance analysis breaking down loss contribution by layer
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- Core claim for two-layer models; composition creates qualitatively more powerful in-context learning
- Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
- Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension