Superposition of Sparse Features

Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption

Neighborhood — ranked by edge-count

concept

latent pattern repository (Pprior)
supports
Unlabeled statistical regularities stored during pretraining.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Sparse Feature Dictionaryconcept0.811
The extracted set of sparse interpretable features from model embeddings via SAEs
Feature Sparsityconcept0.784
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Superposition in Residual Streamconcept0.770
The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
Superposition in Neural Networksconcept0.767
Theoretical model of how neural networks encode more features than dimensions, informing linear representation work.
Memorization in Superpositionconcept0.763
Specific phrases or sequences memorized via binary features in superposition, enabling narrow pattern matching despite few neurons
Sparse and smooth codingconcept0.762
Coding scheme where qualities are represented by few neurons with continuous similarity relations.
Superposition of Simulacraconcept0.761
The state in which a dialogue agent maintains multiple possible characters simultaneously, refined as the conversation proceeds
Superpositionconcept0.760
Phenomenon where models represent more features than dimensions via almost-orthogonal directions.