finding
active
finding:one-layer-model-attention-heads-encode-python-specific-skip-trigrams-including-indentation-based-elif-else-prediction-and-function-signature-patterns

One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patterns

Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.