Superposition in Neural Networks

Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superpositionconcept0.844
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Memorization in Superpositionconcept0.817
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.805
Explanation for why dictionary learning can recover many more features than dimensions.
Neural Networksconcept0.805
Cross-layer superpositionconcept0.799
Representation of features spread across multiple layers, complicating dictionary learning.
Superposition Hypothesisframework0.795
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Superposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.claim0.782
Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
Superposition in Residual Streamconcept0.779
The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces