hypothesis
active
hypothesis:gpt-2-implements-at-least-one-induction-head-using-pointer-arithmetic-on-positional-embeddings-rather-than-k-composition

GPT-2 implements at least one induction head using pointer arithmetic on positional embeddings rather than K-composition

Observation of an alternative induction head implementation algorithm in larger models with positional embeddings in the residual stream

Source paper

extracted_from
A Mathematical Framework for Transformer Circuits
(2021) ·

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.