Power Law Scaling of Data and Model Performance

Empirically observed power law relationship between data scale and model performance; supports convergence hypothesis

Neighborhood — ranked by edge-count

paper

concept

Power law scaling
related_to
Observation that SAE loss decreases as a power law with compute budget.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Inverse Scaling Lawconcept0.787
Hypothesis cited in paper suggesting deceptive capabilities may scale with model size
Scaling laws can be used to guide the training of sparse autoencoders.claim0.765
Compute-optimal hyperparameters follow predictable power-law relationships.
Deceptive capabilities may scale with model size (inverse scaling law hypothesis)hypothesis0.763
Cited hypothesis from Lin et al. 2022 suggesting larger models become more capable of deception
Scaling laws analysis for SAE hyperparametersmethod0.760
Sweeping number of features and training steps to find compute-optimal SAE configurations.
energy scalingconcept0.736
How the energy gain ΔE scales with perimeter length P; used to assess ordered phase existence
entropy scalingconcept0.724
How the entropy gain ΔS scales with perimeter length P
Maximum gradient norm scalingconcept0.718
Scaling aggregated gradient by the maximum gradient norm among tasks.
Multitask Scaling Hypothesishypothesis0.715
Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space