claim
active
claim:feature-splitting-occurs-smaller-sae-features-split-into-multiple-finer-grained-features-in-larger-saesFeature splitting occurs: smaller SAE features split into multiple finer-grained features in larger SAEs.
Observed across SAE scales, e.g., 'San Francisco' split into 11 features.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Findings (1)
finding
- Empirical observation of feature splitting.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.
- Authors argue the absence of a fixed feature count is a property of the superposition geometry, not a failure of the method
- Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
- Scaling SAE size increases granularity and discovers new features.
- Automated interpretability and specificity ratings show SAE features are clearer than MLP neurons.
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Core interpretative claim that VPD's parameter-based decomposition prevents the feature fragmentation seen in activation-based methods.
- A promising property for interpretability analysis off-distribution.