Superposition

Phenomenon where models represent more features than dimensions via almost-orthogonal directions.

Neighborhood — ranked by edge-count

paper

concept

Circuit Motif
extends
A recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
Polysemantic Neuron
associated_with
A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Superposition Hypothesisframework0.850
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Superposition in Neural Networksconcept0.844
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Memorization in Superpositionconcept0.842
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
Cross-layer superpositionconcept0.839
Representation of features spread across multiple layers, complicating dictionary learning.
Superposition of Simulacraconcept0.830
The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
Simulacra in Superposition Frameworkframework0.798
The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
Superposition in Residual Streamconcept0.793
The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
Isotropic Superposition Modelframework0.772
Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete