concept
active
concept:power-law-scaling-of-data-and-model-performancePower Law Scaling of Data and Model Performance
Empirically observed power law relationship between data scale and model performance; supports convergence hypothesis
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Power law scalingrelated_toObservation that SAE loss decreases as a power law with compute budget.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hypothesis cited in paper suggesting deceptive capabilities may scale with model size
- Compute-optimal hyperparameters follow predictable power-law relationships.
- Cited hypothesis from Lin et al. 2022 suggesting larger models become more capable of deception
- Sweeping number of features and training steps to find compute-optimal SAE configurations.
- How the energy gain ΔE scales with perimeter length P; used to assess ordered phase existence
- How the entropy gain ΔS scales with perimeter length P
- Scaling aggregated gradient by the maximum gradient norm among tasks.
- Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space