Isotropic Superposition Model

Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete

Neighborhood — ranked by edge-count

claim

framework

Feature Manifolds
extends
Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superposition Hypothesisframework0.774
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Superpositionconcept0.772
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Cross-layer superpositionconcept0.744
Representation of features spread across multiple layers, complicating dictionary learning.
Superposition of Simulacraconcept0.744
The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
isometryconcept0.741
A distance-preserving transformation: translation, rotation, reflection, glide-reflection
Simulacra in Superposition Frameworkframework0.730
The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
Superposition in Neural Networksconcept0.728
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Memorization in Superpositionconcept0.727
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons