Memorization in Superposition

Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons

Neighborhood — ranked by edge-count

framework

Superposition Hypothesis
associated_with
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superpositionconcept0.842
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Superposition in Neural Networksconcept0.817
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Superposition of Simulacraconcept0.800
The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
Simulacra in Superposition Frameworkframework0.772
The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
Superposition is in some sense deliberate: the model converts pure neurons into polysemantic neurons to store more features in fewer neurons.claim0.770
Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
Superposition of Sparse Featuresconcept0.763
Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
Superposition in Residual Streamconcept0.762
The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
Cross-layer superposition is a fundamental challenge for dictionary learning.claim0.749
Features smeared across layers cannot be fully disentangled by SAE on a single residual stream.