Dual balancing multi-task learning

DB-MTL combines loss-scale and gradient-magnitude balancing, benchmarked across NYUv2, Cityscapes, QM9, and Office datasets.

22 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Dual-Balancing for Multi-Task Learning29 members

Bridges (10)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Dual-balancing multi-task learning21 shared
Loss-scale balancing via logarithmic transformation5 shared
Dynamic balancing for multi-task learning5 shared
Multi-task learning gradient balancing3 shared
Gradient norm scaling for multitask learning3 shared
Multi-task learning gradient balancing2 shared
Gradient conflict mitigation in multi-task learning2 shared
Gradient magnitude balancing for multitask learning1 shared
Skill-based system design principles1 shared
Design principles for care-centered systems1 shared

Claims (14)

DB-MTL does not affect training stability; losses converge smoothly.Training stability claim.
DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.Core claim of the paper.
DB-MTL reduces gradient conflict and improves task balance compared to EW.Effect on gradient conflict.
IMTL-L is equivalent to the logarithm transformation when its parameter st is the exact minimizer in each iteration.Mathematical relationship between IMTL-L and log transformation.
Logarithm transformation is simpler and more effective than learnable loss transformationCompared to IMTL-L: parameter-free, no extra computational cost, achieves same theoretical goal
Loss-scale balancing and gradient-magnitude balancing are complementary and combining them achieves the best performance.Ablation conclusion.
Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
Task balancing is still an open problem in multi-task learning.Motivation for the proposed method.
Task balancing requires simultaneous consideration of both loss scales and gradient magnitudesCore interpretive position of DB-MTL: complementarity of loss and gradient perspectives
The logarithm transformation also benefits existing gradient balancing methods.Generalization of the loss transformation.
The logarithm transformation is simpler and more effective than IMTL-L because it is parameter-free.Comparison of loss-scale balancing techniques.
The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.
The proposed gradient-magnitude balancing method consistently outperforms GradNorm, as it guarantees equal gradient magnitudes and considers update magnitude.Advantage over GradNorm.

Findings (8)

Combining loss-scale and gradient-magnitude balancing achieves Δp = +1.15±0.16 on NYUv2.Full DB-MTL ablation result.
DB-MTL achieves ∆p = +1.15±0.16 on NYUv2, outperforming all baselines including state-of-the-artPrimary empirical validation on scene understanding task
DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.Computational efficiency comparison.
DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.Analysis of gradient conflict reduction.
DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.Training stability analysis.
DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.Effect of EMA forgetting rate on performance.
DB-MTL with SegNet backbone achieves Δp = +8.91 on NYUv2, best among all methods.Performance with a different backbone network.
Logarithm transformation improves PCGrad, GradVac, IMTL-G, CAGrad, Nash-MTL, and Aligned-MTL on NYUv2 (Figure 1).Effectiveness of logarithm transformation as a plug-in for gradient balancing methods.