community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c14-c2Gradient norm scaling for multitask learning
Investigates optimal gradient balancing strategies across tasks, finding maximum gradient norm normalization outperforms alternatives in multitask optimization.
4 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (2)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Claims (3)
- Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
- Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
- The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.
Findings (1)
- Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).Sensitivity analysis for gradient normalization scaling factor.