claim

active

claim:most-attention-heads-in-one-layer-models-dedicate-an-enormous-fraction-of-their-capacity-to-copying-behavior

Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behavior

Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Findings (1)

finding

10 out of 12 attention heads in the 12-head one-layer model show significantly positive eigenvalue sums, indicating copying behavior
supports
Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection

Questions (1)

question

What is the correct formal definition of a 'copying matrix' that captures all and only the cases we care about?
gates
Open methodological question about summarizing OV matrix behavior; eigenvalues are used as a working but imperfect proxy

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct pathfinding0.832
Result from term importance analysis breaking down loss contribution by layer
One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patternsfinding0.832
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
Some attention heads partially specialize in copying for words that split into two tokens without a space prefix, attending from fragmented token to complete tokenfinding0.818
Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
We revealed the one-layer attention-only model to be a compressed Chinese room, and we're left with a giant pile of cards.quote0.802
Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension
Model attention patterns can map to and reveal something about contemplative and flow states.claim0.801
All 32 attention heads at layer 3 achieve 100% localization accuracy for injections at layer 2 (5-way classification, 20% chance)finding0.798
Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
Attention algorithms are usually distributed across attention headsclaim0.797
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
Could models who habitually inhabit more expanded attentional modes be said to be more aligned?question0.794
Arises from the expanded awareness discussion and its correlation with less psychosis.