Optimal learning rate decreases as a power law with compute budget.

Hyperparameter trend observed.

Source paper

extracted_from

Scaling monosemanticity: Ex-tracting interpretable features from claude 3 sonnet

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Optimal number of features scales faster than optimal number of training steps with compute budget.finding0.813
Allocation result from scaling laws.
SAE training loss decreases as a power law with compute budget when using compute-optimal hyperparameters.finding0.807
From scaling laws sweep.
A learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable.quote0.746
Still & Precup (2012) formulation of epistemic imperatives behind curiosity; linked to active inference
Learning Rateconcept0.744
Hyperparameter for optimizing model parameters through learning in active inference.
Unsupervised learning builds a low-dimensional model of the input data.claim0.737
Clarifies what unsupervised learning does.
If loss keeps going down on the test set, in the limit the model must be learning to interpret and predict all patterns represented in language, including common-sense reasoning, goal-directed optimization, and deployment of the sum of recorded human knowledge.hypothesis0.732
Extrapolation of scaling predictive models to AGI.
The upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.claim0.729
Key insight about predictive learning's potential.
Under-estimating the capacity of a system for plasticity, learning, and intelligent problem-solving greatly reduces the toolkit of techniques for understanding and controlling its behavior.claim0.721
Type II error about cognition leads to missed opportunities for top-down control (e.g., training instead of rewiring).