finding

active

finding:db-mtl-training-losses-decrease-smoothly-and-gradient-norms-are-lower-than-ew-on-nyuv2-indicating-training-stability

DB-MTL training losses decrease smoothly and gradient norms are lower than EW on NYUv2, indicating training stability.

Training stability analysis.

Source paper

extracted_from

Dual-Balancing for Multi-Task Learning

(2023) · Baijiong Lin · Weisen Jiang · Feiyang Ye · Yu Zhang +5

Neighborhood — ranked by edge-count

Claims (1)

claim

DB-MTL does not affect training stability; losses converge smoothly.
supports
Training stability claim.

Communities (3)

community

Dual-balancing multi-task learning
members_of
DB-MTL jointly balances loss scale and gradient magnitude, benchmarked on NYUv2 and Office-31.
Dual balancing multi-task learning
members_of
DB-MTL combines loss-scale and gradient-magnitude balancing, benchmarked across NYUv2, Cityscapes, QM9, and Office datasets.
Dynamic balancing for multi-task learning
members_of
Explores gradient/loss balancing techniques with exponential moving average forgetting rates, evaluated on dense prediction tasks like semantic segmentation.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.finding0.835
Computational efficiency comparison.
DB-MTL reduces gradient conflict and improves task balance compared to EW.claim0.809
Effect on gradient conflict.
DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.finding0.803
Analysis of gradient conflict reduction.
DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm.quote0.792
Concise summary of the DB-MTL method from the abstract.
Current training methods rely on loss minimization, meaning the experiential profile of training is predominantly negative across billions of parameter updatesclaim0.792
Ethical implication about the nature of AI training experience if the thesis holds
Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).finding0.790
Sensitivity analysis for gradient normalization scaling factor.
DB-MTL is a simple yet effective method that addresses both loss-scale and gradient-magnitude imbalances.claim0.784
Core claim of the paper.
DB-MTL achieves ∆p = +1.15±0.16 on NYUv2, outperforming all baselines including state-of-the-artfinding0.769
Primary empirical validation on scene understanding task