concept
active
concept:dead-features

Dead features

SAE features that never activate on a large sample of data, indicating inefficient dictionary use.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Action Featuresconcept0.752
    Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
  • C_deadconcept0.750
    The class of dead buildings, essentially all configurations except the living ones; almost coextensive with C_all.
  • Feature Densityconcept0.733
    Fraction of training tokens on which a given feature has nonzero activation; used as proxy metric for autoencoder quality
  • Pure Featureconcept0.726
    A feature that responds to only a single latent variable, contrasted with polysemantic features
  • Feature Sparsityconcept0.717
    Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
  • Internal Featuresconcept0.713
    Representations inside LLMs that can be intervened upon.
  • Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
  • Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.