Dual-balancing multi-task learning

DB-MTL jointly balances loss scale and gradient magnitude, benchmarked on NYUv2 and Office-31.

26 members. Each node is clickable.

Loading graph…

Sub-communities (7)

Finer clusters this community splits into. Each is its own community page.

Loss-scale balancing via logarithmic transformation7 Dynamic balancing for multi-task learning5 Gradient norm scaling for multitask learning4 Multi-task learning gradient balancing3 Multi-task learning gradient balancing3 Gradient conflict mitigation in multi-task learning2 Gradient magnitude balancing for multitask learning2

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Dual-Balancing for Multi-Task Learning26 members

Bridges (8)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Dual balancing multi-task learning21 shared
Loss-scale balancing via logarithmic transformation7 shared
Dynamic balancing for multi-task learning5 shared
Gradient norm scaling for multitask learning4 shared
Multi-task learning gradient balancing3 shared
Multi-task learning gradient balancing3 shared
Gradient conflict mitigation in multi-task learning2 shared
Gradient magnitude balancing for multitask learning2 shared

Claims (13)

DB-MTL does not affect training stability; losses converge smoothly.Training stability claim.
DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.Core claim of the paper.
DB-MTL reduces gradient conflict and improves task balance compared to EW.Effect on gradient conflict.
IMTL-L is equivalent to the logarithm transformation when its parameter st is the exact minimizer in each iteration.Mathematical relationship between IMTL-L and log transformation.
Logarithm transformation is simpler and more effective than learnable loss transformationCompared to IMTL-L: parameter-free, no extra computational cost, achieves same theoretical goal
Loss-scale balancing and gradient-magnitude balancing are complementary and combining them achieves the best performance.Ablation conclusion.
Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
Task balancing requires simultaneous consideration of both loss scales and gradient magnitudesCore interpretive position of DB-MTL: complementarity of loss and gradient perspectives
The logarithm transformation also benefits existing gradient balancing methods.Generalization of the loss transformation.
The logarithm transformation is simpler and more effective than IMTL-L because it is parameter-free.Comparison of loss-scale balancing techniques.
The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.
The proposed gradient-magnitude balancing method consistently outperforms GradNorm, as it guarantees equal gradient magnitudes and considers update magnitude.Advantage over GradNorm.

Findings (13)

Combining loss-scale and gradient-magnitude balancing achieves Δp = +1.15±0.16 on NYUv2.Full DB-MTL ablation result.
DB-MTL achieves ∆p = +1.15±0.16 on NYUv2, outperforming all baselines including state-of-the-artPrimary empirical validation on scene understanding task
DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.Computational efficiency comparison.
DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.Analysis of gradient conflict reduction.
DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.Training stability analysis.
DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.Effect of EMA forgetting rate on performance.
DB-MTL with SegNet backbone achieves Δp = +8.91 on NYUv2, best among all methods.Performance with a different backbone network.
log(x) = min_s (e^s * x - s - 1) for x > 0Mathematical equivalence showing logarithm transformation recovers IMTL-L in the limit
Logarithm transformation improves PCGrad, GradVac, IMTL-G, CAGrad, Nash-MTL, and Aligned-MTL on NYUv2 (Figure 1).Effectiveness of logarithm transformation as a plug-in for gradient balancing methods.
On NYUv2, EW suffers a drop in surface normal prediction (mean angle error 23.57 vs STL 21.99, within 11.25° 35.04 vs 39.04).Task balancing issue where surface normal prediction degrades under EW.
Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).Sensitivity analysis for gradient normalization scaling factor.
The gradient-magnitude balancing method outperforms GradNorm on NYUv2, Cityscapes, Office-31, Office-Home.Comparison of gradient-magnitude balancing with GradNorm.
The logarithm transformation (loss-scale balancing) consistently outperforms IMTL-L on NYUv2, Cityscapes, Office-31, Office-Home.Comparison of loss-scale balancing with IMTL-L.