Dead features

SAE features that never activate on a large sample of data, indicating inefficient dictionary use.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Action Featuresconcept0.752
Dual interpretation of features: in addition to responding to inputs, features also act to increase probability of specific output tokens
C_deadconcept0.750
The class of dead buildings, essentially all configurations except the living ones; almost coextensive with C_all.
Feature Densityconcept0.733
Fraction of training tokens on which a given feature has nonzero activation; used as proxy metric for autoencoder quality
Pure Featureconcept0.726
A feature that responds to only a single latent variable, contrasted with polysemantic features
Feature Sparsityconcept0.717
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Internal Featuresconcept0.713
Representations inside LLMs that can be intervened upon.
feature as applicationconcept0.711
Metaphor treating each system feature or function as a separate application that can be independently loaded and managed.
Feature engineeringconcept0.703
Domain of techniques for constructing informative features from raw data; covariance pooling is a feature engineering method for token sequences.