Multitask Scaling Hypothesis

Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space

Source paper

extracted_from

(2024) · Minyoung Huh · Brian Cheung · Tongzhou Wang · Phillip Isola

paper

finding

concept

Contravariance Principle
extendssupports
Cao & Yamins principle: solution set for an easy goal is large, for a challenging goal comparatively smaller; cited as theoretical basis for multitask scaling hypothesis
Platonic Representation
supports
The hypothesized converged representation that all sufficiently large AI models are approaching — a statistical model of underlying reality
Power law scaling
supports
Observation that SAE loss decreases as a power law with compute budget.
Masked Autoencoders
supports
Self-supervised learning method that optimizes reconstruction tasks; included in the paper's analysis as a multi-task objective
Autoregressive Language Modeling
supports
Training objective interpretable as optimizing a diverse set of tasks; thus subject to multitask scaling convergence pressures

hypothesis

Capacity Hypothesis
associated_with
Bigger models are more likely to converge to a shared representation than smaller models because they can better approximate the global optimum

quote

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Multitask Scalingconcept0.848
The pressure on models trained on more tasks to find representations that generalize across all tasks, reducing the solution space
Multi-Level Selection Theoryframework0.784
Framework dividing covariance of character and fitness into between-collective and within-collective components; addresses limitation of kin selection.
Hypothesis 1 (Threshold Behavior): There exists a task-dependent threshold Sc such that performance exhibits sharp changes as S crosses Sc, with value and transition width depending on model, layer, and poolinghypothesis0.775
Core testable hypothesis of UCCT about the nature of performance transitions under anchoring
Deceptive capabilities may scale with model size (inverse scaling law hypothesis)hypothesis0.768
Cited hypothesis from Lin et al. 2022 suggesting larger models become more capable of deception
Scaling Supervisionframework0.763
Techniques that leverage AI to help humans more efficiently supervise AI.
Multilevel Selection Theoryframework0.754
Scaling of Cognitive Capacity via Modularity and Feedback Loopshypothesis0.751
Processes scaling goals and stressors form positive feedback loop with modularity; both arise from and potentiate power of evolution, enabling specific predictions for cognitive capacity scaling.
Multi-scale competency greatly accelerates evolution and enables generalization.claim0.749
Central thesis about the role of agency in evolutionary dynamics.