concept
active
concept:superpositionSuperposition
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
Neighborhood — ranked by edge-count
Papers (2)
paper
- Zoom In: An Introduction to Circuitsintroduces
Concepts (2)
concept
- Circuit MotifextendsA recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
- Polysemantic Neuronassociated_withA neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
- Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
- Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
- Representation of features spread across multiple layers, complicating dictionary learning.
- The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
- The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
- The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
- Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete