finding
active
finding:a-san-francisco-feature-in-1m-sae-splits-into-11-fine-grained-features-in-34m-saeA 'San Francisco' feature in 1M SAE splits into 11 fine-grained features in 34M SAE.
Empirical observation of feature splitting.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Observed across SAE scales, e.g., 'San Francisco' split into 11 features.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Most features dead in largest SAE, indicating room for improvement.
- The specific SAE architecture trained: 100K+ possible features compressed to 64 active per token for layer-40 activations
- SAE Feature #77278 fires 195,040 times in corpus, associated with satisfaction vs. emptiness dimensionfinding0.759High-frequency SAE feature reported as controlling fundamental positive vs. negative affect dimension
- SAE Feature #92372 fires 666,235 times in corpus, associated with urgency vs. receptive calm dimensionfinding0.754Example of a highly active SAE feature modulating urgency versus acceptance as an emotional dimension
- Basic SAE performance metrics.
- Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.
- Highly active SAE feature with broad emotional modulation and large corpus presence
- Shows that highest emotion-subspace-overlap features induce distinctive thematic outputs