method
active
method:multiple-gradient-descent-algorithm-mgdaMultiple Gradient Descent Algorithm (MGDA)
Gradient balancing by solving multi-objective optimization for minimum-norm aggregated gradient.
Neighborhood — ranked by edge-count
Artifacts (1)
artifact
- The paper proposing the Dual-Balancing Multi-Task Learning method.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Used for updating hidden state expectations; provides dynamical process theory testable against neuronal data
- RL algorithm used to train baseline agents in the physical deception environment
- Baseline method against which probe-based ranking is compared; more computationally expensive.
- Optimization technique that computes weight changes by following the gradient of an error function; contrasted with evolutionary stochastic search.
- Optimization procedure for simultaneously updating action selection and perception; uses step size ζ (default 4).
- DAS uses SGD over differentiable parameterizations of orthogonal matrices (via PyTorch) to find optimal distributed alignments.
- Using language model log probabilities of answer choices (A)/(B) to produce preference labels.
- We hypothesize that degraded generalization on benchmarks like MMLU may reflect the computational demands of the tasks.hypothesis0.731Connecting the paper's task-difficulty findings to prior observations of weak generalization on complex QA benchmarks.