concept
active
concept:sae-sparse-features-100k-features-64-active-per-tokenSAE sparse features (100K+ features, 64 active per token)
The specific SAE architecture trained: 100K+ possible features compressed to 64 active per token for layer-40 activations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Basic SAE performance metrics.
- Interpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.
- Most features dead in largest SAE, indicating room for improvement.
- The individual, supposedly monosemantic directions learned by SAEs; argued here to fragment manifolds into disconnected pieces.
- SAE Feature #92372 fires 666,235 times in corpus, associated with urgency vs. receptive calm dimensionfinding0.768Example of a highly active SAE feature modulating urgency versus acceptance as an emotional dimension
- SAE Feature #77278 fires 195,040 times in corpus, associated with satisfaction vs. emptiness dimensionfinding0.768High-frequency SAE feature reported as controlling fundamental positive vs. negative affect dimension
- Sparse dictionary learning method used to extract interpretable features from EEG transformer embeddings.
- Foundational empirical result enabling all downstream analysis