finding
active
finding:one-layer-model-attention-heads-encode-python-specific-skip-trigrams-including-indentation-based-elif-else-prediction-and-function-signature-patternsOne-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patterns
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core claim for one-layer models; the skip-trigram tables can be accessed without running the model
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
- Early example of using mechanistic interpretability to understand unintended model behavior
- Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
- Result from term importance analysis breaking down loss contribution by layer
- Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.787VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
- Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
- Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.777Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.