finding
active
finding:optimal-learning-rate-decreases-as-a-power-law-with-compute-budgetOptimal learning rate decreases as a power law with compute budget.
Hyperparameter trend observed.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Optimal number of features scales faster than optimal number of training steps with compute budget.finding0.813Allocation result from scaling laws.
- SAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.finding0.807From scaling laws sweep.
- Still & Precup (2012) formulation of epistemic imperatives behind curiosity; linked to active inference
- Hyperparameter for optimizing model parameters through learning in active inference.
- Clarifies what unsupervised learning does.
- Extrapolation of scaling predictive models to AGI.
- Key insight about predictive learning's potential.
- Type II error about cognition leads to missed opportunities for top-down control (e.g., training instead of rewiring).