Gradient norm scaling for multitask learning

Investigates optimal gradient balancing strategies across tasks, finding maximum gradient norm normalization outperforms alternatives in multitask optimization.

4 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Dual-Balancing for Multi-Task Learning4 members

Bridges (2)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Dual-balancing multi-task learning4 shared
Dual balancing multi-task learning3 shared

Claims (3)

Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.

Findings (1)

Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).Sensitivity analysis for gradient normalization scaling factor.