concept
active
concept:feature-sparsity

Feature Sparsity

Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition

Concepts (1)

concept
  • Average number of nonzero feature entries per input; primary measure of activation sparsity in the autoencoder

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The extracted set of sparse interpretable features from model embeddings via SAEs
  • Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
  • Feature splittingconcept0.782
    Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
  • Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
  • Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
  • Feature Loopingconcept0.774
    Repetitive behavioral pattern observed under high steering strengths in SAE feature self-steering experiments
  • Pure Featureconcept0.774
    A feature that responds to only a single latent variable, contrasted with polysemantic features
  • Patterns of which features activate together across tokens; preserved by covariance pooling but lost in mean pooling.