Superposition Hypothesis

Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

Neighborhood — ranked by edge-count

paper

concept

Linear representation
supports
The idea that features are encoded as directions in activation space.
Polysemanticity
associated_with
Neurons that respond to multiple unrelated concepts, limiting interpretability.
Feature Sparsity
supports
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Memorization in Superposition
associated_with
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
Noisy Simulation of Sparse Networks
associated_with
Mechanism by which superposition works: small neural networks exploit sparsity to approximately simulate much larger sparse networks
Overcomplete Basis
associated_with
A set of feature directions that is larger than the dimensionality of the space, enabling superposition

claim

framework

Sparse Autoencoder for Dictionary Learning
implements
Primary method introduced: trains a one-hidden-layer MLP with L1 sparsity penalty to decompose model activations into overcomplete feature dictionaries
Disentanglement
contradicts
Related research agenda seeking representations that separate conceptually distinct factors; contrasted with superposition approach

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superpositionconcept0.850
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Cross-layer superpositionconcept0.802
Representation of features spread across multiple layers, complicating dictionary learning.
Superposition in Neural Networksconcept0.795
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Superposition of Simulacraconcept0.788
The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
Genesis Hypothesisframework0.783
The conjecture that consciousness does not result from the organized mind but creates and maintains complex models of reality; forms at the beginning of mental development
Simulacra in Superposition Frameworkframework0.781
The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
Universality Hypothesisconcept0.779
The hypothesis that analogous features and circuits reliably form across different neural network models and tasks
Isotropic Superposition Modelframework0.774
Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete