finding
active
finding:some-attention-heads-partially-specialize-in-copying-for-words-that-split-into-two-tokens-without-a-space-prefix-attending-from-fragmented-token-to-complete-token

Some attention heads partially specialize in copying for words that split into two tokens without a space prefix, attending from fragmented token to complete token

Interesting special case of copying behavior related to tokenization artifacts; primitive precursor to induction heads

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.