concept
active
concept:superposition-in-neural-networksSuperposition in Neural Networks
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
- Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.805Explanation for why dictionary learning can recover many more features than dimensions.
- Representation of features spread across multiple layers, complicating dictionary learning.
- Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
- Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
- The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces