framework
active
framework:a-mathematical-framework-for-transformer-circuitsA Mathematical Framework for Transformer Circuits
Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition
Neighborhood — ranked by edge-count
Papers (2)
paper
Methods (2)
method
- Logit Weight AnalysisextendsComputing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix
- The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis
Concepts (4)
concept
- Residual StreamimplementsProposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- Path Expansion TrickimplementsThe mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term
- Informal analogy mentioned by Joshi treating attention patterns as weights on a graph, framing transformer tensor products as graph convolutions
- Mathematical notation used throughout the paper to express operations that simultaneously act on position dimensions and vector dimensions of activations
Frameworks (1)
framework
- Distill Circuits ThreadextendsPrior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Foundational mechanistic interpretability paper on transformer circuit analysis
- Mechanistic interpretability framework for understanding neural network computation as circuits of features
- Neural network architecture based on attention, commonly used in large language models
- A systematic framework for analyzing and comparing programming systems along multiple independent axes, proposed as a common language for programming systems research.
- 1984 Ashton-Tate integrated system with frames, FRED language, and overlapping windows; design reference for Playground's approach.
- Base architecture of reasoning LLMs studied, with attention and MLP blocks per layer
- Learning to encode position for transformer with continuous dynamical model (Liu et al., 2020)concept0.731Prior work on learned dynamic position encodings; cited alongside Wang et al. as precedent.