Path Expansion Trick

The mathematical trick of expanding a product of layer terms into a sum of end-to-end path terms, enabling independent analysis of each term

Neighborhood — ranked by edge-count

thinker

Dong et al.
studies
Prior work that considered paths through a self-attention network in analyzing transformer expressivity, deriving the same path expansion structure

framework

A Mathematical Framework for Transformer Circuits
implements
Prior Anthropic paper enabling circuit-level analysis of attention-only transformers; motivates current MLP decomposition

claim

The residual stream has a deeply linear structure, enabling virtual weights and path expansion analysis
supports
Architectural observation enabling the entire mathematical framework; the residual stream is purely a sum of linear projections

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Path Expansion Methodmethod0.865
The core analytical technique of expanding transformer computations from layer-by-layer products into sums of end-to-end path terms for independent analysis
Path Patchingmethod0.766
Method by Goldowsky-Dill et al. 2023 for localizing model behavior via targeted activation interventions
Path Integrationconcept0.729
Neural mechanism for tracking location through accumulation of self-movement vectors; shown to play the role of position encodings in TEM.
How can we get the land for the path system to grow?question0.706
Specific implementation question about land acquisition for the pedestrian hull.
exponential path combinatoricsconcept0.694
The number of distinct paths information can travel from point A to B in a transformer is C(m+n, n), quickly exceeding the number of atoms in the universe.
Representation-based Pathconcept0.691
The path in activation space derived by fitting the representation manifold, used to steer along the geometric structure of internal representations.
Information paths from A to B can exceed C(m+n, n) distinct routes, where m=position displacement and n=layer displacement.finding0.686
Quantifies extreme redundancy in transformer routing; supports claim that introspection and interference patterns are architecturally permitted.
Sum-Over-Paths Methodconcept0.683
Feynman's quantum method where global behavior of light emerges from local behaviors with assigned probabilities; cited as example of global emerging from local