framework
active
framework:a-mathematical-framework-for-transformer-circuits

A Mathematical Framework for Transformer Circuits

Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition

Neighborhood — ranked by edge-count

Methods (2)

method
  • Computing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix
  • The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis

Concepts (4)

concept
  • Residual Stream
    implements
    Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
  • The mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term
  • Informal analogy mentioned by Joshi treating attention patterns as weights on a graph, framing transformer tensor products as graph convolutions
  • Mathematical notation used throughout the paper to express operations that simultaneously act on position dimensions and vector dimensions of activations

Frameworks (1)

framework
  • Prior mechanistic interpretability work reverse-engineering vision models (InceptionV1); the direct predecessor this paper extends to language models

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.