finding
active
finding:optimal-number-of-features-scales-faster-than-optimal-number-of-training-steps-with-compute-budgetOptimal number of features scales faster than optimal number of training steps with compute budget.
Allocation result from scaling laws.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Compute-optimal hyperparameters follow predictable power-law relationships.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hyperparameter trend observed.
- Quantitative relationship between concept frequency and feature presence.
- Selective pressure toward convergence via task generality
- what is the 'correct number of features' for dictionary learning, and is this question well-posed?question0.772Open question about whether there is a true discrete feature count or a continuous splitting process
- SAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.finding0.764From scaling laws sweep.
- Feature presence depends on concept frequency in training data, with a threshold scaling inversely with alive features.
- Main functional claim about MCA.
- Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study