community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run4-c14Dual-balancing multi-task learning
DB-MTL jointly balances loss scale and gradient magnitude, benchmarked on NYUv2 and Office-31.
26 members. Each node is clickable.
Loading graph…
Sub-communities (7)
Finer clusters this community splits into. Each is its own community page.
Loss-scale balancing via logarithmic transformation7Dynamic balancing for multi-task learning5Gradient norm scaling for multitask learning4Multi-task learning gradient balancing3Multi-task learning gradient balancing3Gradient conflict mitigation in multi-task learning2Gradient magnitude balancing for multitask learning2
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
- Dual-Balancing for Multi-Task Learning26 members
Bridges (8)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Dual balancing multi-task learning21 shared
- Loss-scale balancing via logarithmic transformation7 shared
- Dynamic balancing for multi-task learning5 shared
- Gradient norm scaling for multitask learning4 shared
- Multi-task learning gradient balancing3 shared
- Multi-task learning gradient balancing3 shared
- Gradient conflict mitigation in multi-task learning2 shared
- Gradient magnitude balancing for multitask learning2 shared
Claims (13)
- DB-MTL does not affect training stability; losses converge smoothly.Training stability claim.
- DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.Core claim of the paper.
- DB-MTL reduces gradient conflict and improves task balance compared to EW.Effect on gradient conflict.
- IMTL-L is equivalent to the logarithm transformation when its parameter st is the exact minimizer in each iteration.Mathematical relationship between IMTL-L and log transformation.
- Logarithm transformation is simpler and more effective than learnable loss transformationCompared to IMTL-L: parameter-free, no extra computational cost, achieves same theoretical goal
- Loss-scale balancing and gradient-magnitude balancing are complementary and combining them achieves the best performance.Ablation conclusion.
- Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingEmpirical finding on choice of αk in gradient normalization strategy
- Setting αk as the maximum gradient norm among tasks performs best.Recommended strategy for gradient normalization.
- Task balancing requires simultaneous consideration of both loss scales and gradient magnitudesCore interpretive position of DB-MTL: complementarity of loss and gradient perspectives
- The logarithm transformation also benefits existing gradient balancing methods.Generalization of the loss transformation.
- The logarithm transformation is simpler and more effective than IMTL-L because it is parameter-free.Comparison of loss-scale balancing techniques.
- The magnitude of the normalized gradients (choice of αk) plays an important role in performance.Insight about gradient normalization scaling.
- The proposed gradient-magnitude balancing method consistently outperforms GradNorm, as it guarantees equal gradient magnitudes and considers update magnitude.Advantage over GradNorm.
Findings (13)
- Combining loss-scale and gradient-magnitude balancing achieves Δp = +1.15±0.16 on NYUv2.Full DB-MTL ablation result.
- DB-MTL achieves ∆p = +1.15±0.16 on NYUv2, outperforming all baselines including state-of-the-artPrimary empirical validation on scene understanding task
- DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.Computational efficiency comparison.
- DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.Analysis of gradient conflict reduction.
- DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.Training stability analysis.
- DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.Effect of EMA forgetting rate on performance.
- DB-MTL with SegNet backbone achieves Δp = +8.91 on NYUv2, best among all methods.Performance with a different backbone network.
- log(x) = min_s (e^s * x - s - 1) for x > 0Mathematical equivalence showing logarithm transformation recovers IMTL-L in the limit
- Logarithm transformation improves PCGrad, GradVac, IMTL-G, CAGrad, Nash-MTL, and Aligned-MTL on NYUv2 (Figure 1).Effectiveness of logarithm transformation as a plug-in for gradient balancing methods.
- On NYUv2, EW suffers a drop in surface normal prediction (mean angle error 23.57 vs STL 21.99, within 11.25° 35.04 vs 39.04).Task balancing issue where surface normal prediction degrades under EW.
- Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).Sensitivity analysis for gradient normalization scaling factor.
- The gradient-magnitude balancing method outperforms GradNorm on NYUv2, Cityscapes, Office-31, Office-Home.Comparison of gradient-magnitude balancing with GradNorm.
- The logarithm transformation (loss-scale balancing) consistently outperforms IMTL-L on NYUv2, Cityscapes, Office-31, Office-Home.Comparison of loss-scale balancing with IMTL-L.