finding

active

finding:some-attention-heads-partially-specialize-in-copying-for-words-that-split-into-two-tokens-without-a-space-prefix-attending-from-fragmented-token-to-complete-token

Some attention heads partially specialize in copying for words that split into two tokens without a space prefix, attending from fragmented token to complete token

Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

A pair of query and key subcomponents distributed across attention heads performs previous-token behaviorfinding0.822
VPD recovers an attention algorithm for attending to the previous token, distributed across multiple heads.
Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.818
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
Attention heads can be understood as independent operations each adding their output to the residual stream, equivalent to the concatenate-and-multiply formulationclaim0.813
Mathematical equivalence enabling independent analysis of each attention head
Attention algorithms are usually distributed across attention headsclaim0.791
Claim supported by VPD's recovery of cross-head attention subcomponents, noted in footnote.
The ability to distinguish injected thoughts from text likely relies on different attention heads invoked by different prompt partsclaim0.791
Speculation about the mechanistic basis of the distinguishing thoughts from text experiment.
Attention computations distribute across heads via parameter subcomponents with interpretable rolesfinding0.786
Mechanistic discovery about how attention mechanisms decompose into interpretable parameter components.
One-layer model attention heads encode Python-specific skip-trigrams including indentation-based elif/else prediction and function signature patternsfinding0.785
Concrete example from examining expanded QK/OV matrices showing how specific programming language structure is encoded in attention weights
Introspection relies on general-purpose computational mechanisms—attention-based anomaly detection and residual stream dynamics—rather than specialized introspection circuitsclaim0.780
Interpretive claim about the mechanistic substrate of introspection in LLMs