question
active
question:what-is-the-correct-number-of-features-for-dictionary-learning-and-is-this-question-well-posedwhat is the 'correct number of features' for dictionary learning, and is this question well-posed?
Open question about whether there is a true discrete feature count or a continuous splitting process
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Authors argue the absence of a fixed feature count is a property of the superposition geometry, not a failure of the method
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- what metrics can reliably tell us if dictionary learning has successfully extracted high quality features?question0.811Central methodological gap: current metrics (loss, density histograms, manual inspection) are inadequate
- Motivation for using sparsity-based dictionary learning on language models
- Scaling laws for dictionary learning are unknown and needed to assess feasibility on frontier models
- Feature presence depends on concept frequency in training data, with a threshold scaling inversely with alive features.
- Authors argue features are model properties because logit effects and ablations are consistent with feature interpretations
- Optimal number of features scales faster than optimal number of training steps with compute budget.finding0.772Allocation result from scaling laws.
- Controls for dataset structure, showing trained model activations have richer structure than data distribution alone
- Central question motivating attribute exploration.