concept
active
concept:memorization-in-superpositionMemorization in Superposition
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Superposition Hypothesisassociated_withCore theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Findings (1)
finding
- Demonstrates mechanistic memorization via feature assemblies in superposition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
- Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
- The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
- The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
- Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
- Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
- The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
- Features smeared across layers cannot be fully disentangled by SAE on a single residual stream.