framework
active
framework:isotropic-superposition-modelIsotropic Superposition Model
Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete
Neighborhood — ranked by edge-count
Claims (1)
claim
- Authors revise their own prior Toy Models framework based on evidence from feature splitting and geometry
Frameworks (1)
framework
- Feature ManifoldsextendsHypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
- Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
- Representation of features spread across multiple layers, complicating dictionary learning.
- The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
- A distance-preserving transformation: translation, rotation, reflection, glide-reflection
- The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
- Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
- Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons