finding
active
finding:setting-k-to-the-maximum-gradient-norm-performs-best-among-tested-strategies-on-nyuv2-figure-6Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).
Sensitivity analysis for gradient normalization scaling factor.
Source paper
extracted_from(2023) · Baijiong Lin · Weisen Jiang · Feiyang Ye · Yu Zhang +5
Neighborhood — ranked by edge-count
Claims (2)
claim
- Recommended strategy for gradient normalization.
- The magnitude of the normalized gradients (choice of αk) plays an important role in performance.supportsInsight about gradient normalization scaling.
Communities (2)
community
- Dual-balancing multi-task learningmembers_ofDB-MTL jointly balances loss scale and gradient magnitude, benchmarked on NYUv2 and Office-31.
- Investigates optimal gradient balancing strategies across tasks, finding maximum gradient norm normalization outperforms alternatives in multitask optimization.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Training stability analysis.
- Setting aggregated gradient scaling factor to maximum gradient norm performs best for task balancingclaim0.770Empirical finding on choice of αk in gradient normalization strategy
- The gradient-magnitude balancing method outperforms GradNorm on NYUv2, Cityscapes, Office-31, Office-Home.finding0.759Comparison of gradient-magnitude balancing with GradNorm.
- Advantage over GradNorm.
- Combining loss-scale and gradient-magnitude balancing achieves Δp = +1.15±0.16 on NYUv2.finding0.752Full DB-MTL ablation result.
- Proposed conjecture in §4.3.1.
- Claim about current practical feasibility and efficiency of 2-way associative implementations.
- Computational efficiency comparison.