finding
active
finding:sae-training-loss-decreases-as-a-power-law-with-compute-budget-when-using-compute-optimal-hyperparametersSAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.
From scaling laws sweep.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Compute-optimal hyperparameters follow predictable power-law relationships.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The objective function combining L2 reconstruction error and L1 penalty scaled by decoder norm, used to train the SAE.
- Hyperparameter trend observed.
- Ethical implication about the nature of AI training experience if the thesis holds
- A promising property for interpretability analysis off-distribution.
- Optimal number of features scales faster than optimal number of training steps with compute budget.finding0.764Allocation result from scaling laws.
- Key methodological contribution claim about architecture-agnostic SAE tuning
- Author's conclusion after extensive investigation of architectural approaches to monosemanticity
- Training stability analysis.