finding

active

finding:one-layer-model-attention-heads-encode-python-specific-skip-trigrams-including-indentation-based-elif-else-prediction-and-function-signature-patterns

One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patterns

Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Claims (1)

claim

One-layer attention-only transformers are an ensemble of bigram and skip-trigram models whose parameters can be read directly from weights
supports
Core claim for one-layer models; the skip-trigram tables can be accessed without running the model

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.832
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
10 out of 12 attention heads in the 12-head one-layer model show significantly positive eigenvalue sums, indicating copying behaviorfinding0.800
Quantitative result from eigenvalue analysis of expanded OV matrices; confirmed by qualitative inspection
Skip-trigram bugs in one-layer models demonstrate interpretability can reveal and characterize specific model failure modesclaim0.800
Early example of using mechanistic interpretability to understand unintended model behavior
Induction heads explain in-context learning in small models and only develop in models with at least two attention layersclaim0.793
Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
In the analyzed two-layer model, second-layer attention head terms dominate the loss reduction compared to first-layer terms and the direct pathfinding0.787
Result from term importance analysis breaking down loss contribution by layer
Identification of algorithms implemented in attention layers, distributed across attention headsfinding0.787
VPD successfully recovered interpretable attention algorithms (previous-token behavior, syntax-boundary routing) in weight space without requiring manual decomposition across heads.
Some attention heads partially specialize in copying for words that split into two tokens without a space prefix, attending from fragmented token to complete tokenfinding0.785
Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads
Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.777
Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.