claim
active
claim:most-attention-heads-in-one-layer-models-dedicate-an-enormous-fraction-of-their-capacity-to-copying-behaviorMost attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behavior
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
Neighborhood — ranked by edge-count
Findings (1)
finding
- Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
Questions (1)
question
- Open methodological question about summarizing OV matrix behavior; eigenvalues are used as a working but imperfect proxy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Result from term importance analysis breaking down loss contribution by layer
- Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
- Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
- Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension
- Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
- Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
- Could models who habitually inhabit more expanded attentional modes be said to be more aligned?question0.794Arises from the expanded awareness discussion and its correlation with less psychosis.