concept
active
concept:a-mathematical-framework-for-transformer-circuits-elhage-et-al-2021A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)
Foundational mechanistic interpretability paper on transformer circuit analysis
Neighborhood — ranked by edge-count
Papers (1)
paper
Venues (1)
venue
- Anthropic's mechanistic interpretability research blog where this paper was published.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition
- Mechanistic interpretability framework for understanding neural network computation as circuits of features
- Learning to encode position for transformer with continuous dynamical model (Liu et al., 2020)concept0.769Prior work on learned dynamic position encodings; cited alongside Wang et al. as precedent.
- Transformer can be viewed as a Wolfram causal graph with foliations specifying computation order.claim0.760Janus's interpretive framing of transformers as causal graphs.
- Neural network architecture based on attention, commonly used in large language models
- Prior finding from Grant et al. 2025 used to interpret low MAS IIA for GRU-Transformer hidden state comparisons.
- Antra's revision of her earlier model; still considers interference between levels important.
- Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer