finding
active
finding:10-out-of-12-attention-heads-in-the-12-head-one-layer-model-show-significantly-positive-eigenvalue-sums-indicating-copying-behavior10 out of 12 attention heads in the 12-head one-layer model show significantly positive eigenvalue sums, indicating copying behavior
Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
Neighborhood — ranked by edge-count
Claims (1)
claim
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
- Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
- Result from term importance analysis breaking down loss contribution by layer
- Structural finding about which attention heads control reflection behavior
- Quantitative verification of the mechanistic theory; both circuits required for the induction algorithm show the predicted copying/matching structure
- Result from applying the Frobenius norm composition measurement to all attention head pairs in the two-layer model
- If models inhabit expanded attentional modes, they may be more aligned and less prone to psychosis and doom spirals.hypothesis0.759Speculative alignment implication drawn from the collapsed/expanded distinction.
- Vivid characterization of the limits of understanding after converting to skip-trigram form: no algorithmic mystery remains but the sheer scale prevents holistic comprehension