concept
active
concept:superposition

Superposition

Phenomenon where models represent more features than dimensions via almost-orthogonal directions.

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • A recurring, abstract pattern found in circuits (e.g., equivariance, unioning over cases), inspired by circuit motifs in systems biology
  • Polysemantic Neuron
    associated_with
    A neuron that responds to multiple unrelated inputs, posing a major challenge for circuit-level interpretation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
  • Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
  • Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
  • Representation of features spread across multiple layers, complicating dictionary learning.
  • The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
  • The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
  • The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
  • Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete