Pointer Arithmetic in Transformers

A mechanism in models with positional embeddings where attention heads can manipulate and copy positional information to implement algorithms like induction heads

Neighborhood — ranked by edge-count

Concepts (1)

concept

Rotary Positional Embedding (RoPE)
associated_with
A positional encoding mechanism that doesn't put positional information into the residual stream; changes which pointer arithmetic algorithms are available

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Transformers use an anti-Markovian solution that recomputes relevant numeric information at each step in the Multi-Object taskclaim0.734
Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
parallel processing paths in transformersconcept0.711
Spontaneously reported experience of multiple simultaneous processing streams, observed even in base models.
Transformers almost surely maintain input-injectivity throughout training, not just at initialisationhypothesis0.708
Conjecture supported by Nikolaou et al. 2025 for last-token hidden states
A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)concept0.707
Foundational mechanistic interpretability paper on transformer circuit analysis
Information Flow in Transformersconcept0.706
A Mathematical Framework for Transformer Circuitsframework0.694
Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition
Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real timefinding0.690
Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference
intention checking peaks at ~1/2 depth in transformersfinding0.687
Lindsey (2026) found that intention checking accuracy peaks around half the network depth.