A Mathematical Framework for Transformer Circuits

Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition

Neighborhood — ranked by edge-count

paper

method

Logit Weight Analysis
extends
Computing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix
Path Expansion Method
uses
The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis

concept

Residual Stream
implements
Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Path Expansion Trick
implements
The mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term
Graph Neural Network (GNN) Analogy to Transformers
analogous_to
Informal analogy mentioned by Joshi treating attention patterns as weights on a graph, framing transformer tensor products as graph convolutions
Tensor Product / Kronecker Product Notation
implements
Mathematical notation used throughout the paper to express operations that simultaneously act on position dimensions and vector dimensions of activations

framework

Distill Circuits Thread
extends
Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)concept0.939
Foundational mechanistic interpretability paper on transformer circuit analysis
Circuits Frameworkframework0.813
Mechanistic interpretability framework for understanding neural network computation as circuits of features
transformer architectureframework0.761
Neural network architecture based on attention, commonly used in large language models
Technical Dimensions Frameworkframework0.751
A systematic framework for analyzing and comparing programming systems along multiple independent axes, proposed as a common language for programming systems research.
Information Theoretic Frameworkframework0.750
Frameworkconcept0.742
1984 Ashton-Tate integrated system with frames, FRED language, and overlapping windows; design reference for Playground's approach.
Transformer decoder architectureframework0.739
Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer
Learning to encode position for transformer with continuous dynamical model (Liu et al., 2020)concept0.731
Prior work on learned dynamic position encodings; cited alongside Wang et al. as precedent.