finding

active

finding:db-mtl-increases-gradient-cosine-similarity-faster-and-keeps-it-positive-on-office-31-reducing-gradient-conflict-vs-ew

DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.

Analysis of gradient conflict reduction.

Source paper

extracted_from

Dual-Balancing for Multi-Task Learning

(2023) · Baijiong Lin · Weisen Jiang · Feiyang Ye · Yu Zhang +5

Neighborhood — ranked by edge-count

Claims (1)

claim

DB-MTL reduces gradient conflict and improves task balance compared to EW.
supports
Effect on gradient conflict.

Communities (3)

community

Dual-balancing multi-task learning
members_of
DB-MTL jointly balances loss scale and gradient magnitude, benchmarked on NYUv2 and Office-31.
Dual balancing multi-task learning
members_of
DB-MTL combines loss-scale and gradient-magnitude balancing, benchmarked across NYUv2, Cityscapes, QM9, and Office datasets.
Gradient conflict mitigation in multi-task learning
members_of
Dynamic balancing methods that increase gradient alignment and reduce task interference, evaluated on Office-31 domain adaptation.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.claim0.834
Core claim of the paper.
DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm.quote0.821
Concise summary of the DB-MTL method from the abstract.
DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.finding0.816
Computational efficiency comparison.
DB-MTL does not affect training stability; losses converge smoothly.claim0.812
Training stability claim.
DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.finding0.804
Effect of EMA forgetting rate on performance.
DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.finding0.803
Training stability analysis.
The proposed gradient-magnitude balancing method consistently outperforms GradNorm, as it guarantees equal gradient magnitudes and considers update magnitude.claim0.768
Advantage over GradNorm.
As can be seen, most MTL baselines perform better than STL on semantic segmentation and depth estimation, but have a large drop on the surface normal prediction task, suffering from the task balancing problem.quote0.764
Observation illustrating the task balancing problem on NYUv2.