quote

active

quote:as-can-be-seen-most-mtl-baselines-perform-better-than-stl-on-semantic-segmentation-and-depth-estimation-but-have-a-large-drop-on-the-surface-normal-prediction-task-suffering-from-the-task-balancing-problem

As can be seen, most MTL baselines perform better than STL on semantic segmentation and depth estimation, but have a large drop on the surface normal prediction task, suffering from the task balancing problem.

Observation illustrating the task balancing problem on NYUv2.

Source paper

extracted_from

Dual-Balancing for Multi-Task Learning

(2023) · Baijiong Lin · Weisen Jiang · Feiyang Ye · Yu Zhang +5

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DB-MTL reduces gradient conflict and improves task balance compared to EW.claim0.792
Effect on gradient conflict.
DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm.quote0.790
Concise summary of the DB-MTL method from the abstract.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.765
Selective pressure toward convergence via task generality
DB-MTL increases gradient cosine similarity faster and keeps it positive on Office-31, reducing gradient conflict vs EW.finding0.764
Analysis of gradient conflict reduction.
DB-MTL has similar per-epoch running time to gradient balancing methods on NYUv2, slower than loss balancing methods.finding0.762
Computational efficiency comparison.
We hypothesize that degraded generalization on benchmarks like MMLU may reflect the computational demands of the tasks.hypothesis0.758
Connecting the paper's task-difficulty findings to prior observations of weak generalization on complex QA benchmarks.
Under spatio permutation controls, IIT consciousness estimates outperform Span Representation in mean AUC in several cases (LLaMA3.1-70B on Hinting and Irony, Mistral-7B on Irony, LLaMA3.1-8B on Strange Stories).finding0.752
Contrasts with temporal permutation where Span Representation dominates; suggests spatio permutation reveals different dynamics.
DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.finding0.750
Effect of EMA forgetting rate on performance.