concept
active
concept:superposition-of-sparse-features

Superposition of Sparse Features

Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption

Neighborhood — ranked by edge-count

Concepts (1)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The extracted set of sparse interpretable features from model embeddings via SAEs
  • Feature Sparsityconcept0.784
    Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
  • The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
  • Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
  • Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
  • Coding scheme where qualities are represented by few neurons with continuous similarity relations.
  • The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
  • Superpositionconcept0.760
    Phenomenon where models represent more features than dimensions via almost-orthogonal directions.