hypothesis
active
hypothesis:gpt-2-implements-at-least-one-induction-head-using-pointer-arithmetic-on-positional-embeddings-rather-than-k-compositionGPT-2 implements at least one induction head using pointer arithmetic on positional embeddings rather than K-composition
Observation of an alternative induction head implementation algorithm in larger models with positional embeddings in the residual stream
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The mechanistic explanation of how induction heads are implemented in two-layer models
- Forward-looking claim connecting toy model findings to large-scale language models
- Early large language model cited as an example of transformer-based LLMs
- Cited as causal intervention methodology precedent for this paper's ablation approach
- Case Study I demonstrating pyvene can replicate a major interpretability result compactly
- Central empirical claim of the paper; induction heads are shown to be the mechanism for powerful in-context learning
- Load-bearing demonstration of pyvene's conciseness for complex replication tasks
- Quantitative verification that the copying and matching structure predicted by the mechanistic theory is present in all observed induction heads