framework
active
framework:superposition-hypothesis

Superposition Hypothesis

Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

Neighborhood — ranked by edge-count

Concepts (6)

concept
  • The idea that features are encoded as directions in activation space.
  • Polysemanticity
    associated_with
    Neurons that respond to multiple unrelated concepts, limiting interpretability.
  • Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
  • Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
  • Mechanism by which superposition works: small neural networks exploit sparsity to approximately simulate much larger sparse networks
  • Overcomplete Basis
    associated_with
    A set of feature directions that is larger than the dimensionality of the space, enabling superposition

Claims (2)

claim

Frameworks (2)

framework
  • Primary method introduced: trains a one-hidden-layer MLP with L1 sparsity penalty to decompose model activations into overcomplete feature dictionaries
  • Disentanglement
    contradicts
    Related research agenda seeking representations that separate conceptually distinct factors; contrasted with superposition approach

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Superpositionconcept0.850
    Phenomenon where models represent more features than dimensions via almost-orthogonal directions.
  • Representation of features spread across multiple layers, complicating dictionary learning.
  • Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
  • The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
  • Genesis Hypothesisframework0.783
    The conjecture that consciousness does not result from the organized mind but creates and maintains complex models of reality; forms at the beginning of mental development
  • The more nuanced second metaphor: LLM as simulator maintaining a superposition of possible simulacra across a multiverse of characters
  • The hypothesis that analogous features and circuits reliably form across different neural network models and tasks
  • Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete