community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c9Dual balancing multi-task learning
DB-MTL combines loss-scale and gradient-magnitude balancing, benchmarked across NYUv2, Cityscapes, QM9, and Office datasets.
22 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
- Dual-Balancing for Multi-Task Learning29 members
Bridges (10)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Dual-balancing multi-task learning21 shared
- Loss-scale balancing via logarithmic transformation5 shared
- Dynamic balancing for multi-task learning5 shared
- Multi-task learning gradient balancing3 shared
- Gradient norm scaling for multitask learning3 shared
- Multi-task learning gradient balancing2 shared
- Gradient conflict mitigation in multi-task learning2 shared
- Gradient magnitude balancing for multitask learning1 shared
- Skill-based system design principles1 shared
- Design principles for care-centered systems1 shared
Claims (14)
- DB-MTL does not affect training stability; losses converge smoothly.Training stability claim.
- DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.Core claim of the paper.
- DB-MTL reduces gradient conflict and improves task balance compared to EW.Effect on gradient conflict.
- IMTL-L is equivalent to the logarithm transformation when its parameter st is the exact minimizer in each iteration.Mathematical relationship between IMTL-L and log transformation.
- Logarithm transformation is simpler and more effective than learnable loss transformationCompared to IMTL-L: parameter-free, no extra computational cost, achieves same theoretical goal
- Loss-scale balancing and gradient-magnitude balancing are complementary and combining them achieves the best performance.Ablation conclusion.
- Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
- Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
- Task balancing is still an open problem in multi-task learning.Motivation for the proposed method.
- Task balancing requires simultaneous consideration of both loss scales and gradient magnitudesCore interpretive position of DB-MTL: complementarity of loss and gradient perspectives
- The logarithm transformation also benefits existing gradient balancing methods.Generalization of the loss transformation.
- The logarithm transformation is simpler and more effective than IMTL-L because it is parameter-free.Comparison of loss-scale balancing techniques.
- The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.
- The proposed gradient-magnitude balancing method consistently outperforms GradNorm, as it guarantees equal gradient magnitudes and considers update magnitude.Advantage over GradNorm.
Findings (8)
- Combining loss-scale and gradient-magnitude balancing achieves Δp = +1.15±0.16 on NYUv2.Full DB-MTL ablation result.
- DB-MTL achieves ∆p = +1.15±0.16 on NYUv2, outperforming all baselines including state-of-the-artPrimary empirical validation on scene understanding task
- DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.Computational efficiency comparison.
- DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.Analysis of gradient conflict reduction.
- DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.Training stability analysis.
- DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.Effect of EMA forgetting rate on performance.
- DB-MTL with SegNet backbone achieves Δp = +8.91 on NYUv2, best among all methods.Performance with a different backbone network.
- Logarithm transformation improves PCGrad, GradVac, IMTL-G, CAGrad, Nash-MTL, and Aligned-MTL on NYUv2 (Figure 1).Effectiveness of logarithm transformation as a plug-in for gradient balancing methods.