finding
active
finding:34m-sae-had-roughly-65-dead-features34M SAE had roughly 65% dead features.
Most features dead in largest SAE, indicating room for improvement.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Basic SAE performance metrics.
- Empirical observation of feature splitting.
- The specific SAE architecture trained: 100K+ possible features compressed to 64 active per token for layer-40 activations
- SAE features are not simply mirroring individual neurons.
- 168 of 4,096 A/1 features are dead and 292 are ultralow density, leaving 3,636 for analysisfinding0.764Characterizes the live vs dead feature distribution in the main autoencoder run
- SAE Feature #77278 fires 195,040 times in corpus, associated with satisfaction vs. emptiness dimensionfinding0.757High-frequency SAE feature reported as controlling fundamental positive vs. negative affect dimension
- Claim that feature grounding enables interpretability metrics.
- Shows high emotion subspace overlap for a specific negative emotion feature