hypothesis
active
hypothesis:if-ei-maximization-is-used-as-a-regularization-in-representation-learning-then-ood-generalization-will-improve-beyond-current-invariant-risk-minimization-methodsIf EI maximization is used as a regularization in representation learning, then OOD generalization will improve beyond current invariant risk minimization methods.
Proposed conjecture in §4.3.1.
Source paper
extracted_from(2023) · Bing Yuan · Jiang Zhang · Aobo Lyu · Jiayun Wu +5
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Effective Information (EI)associated_withCore measure of causal effect in Hoel's theory; mutual information between uniform input and output distributions.
- Out-of-Distribution (OOD) Generalizationassociated_withMachine learning generalization when training and test distributions differ; linked to causal invariance.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- EI and normalized EI could serve as a unified metric for out-of-distribution generalization.claim0.837Conjecture that maximizing EI yields causal representations invariant to distribution shifts.
- EI maximization serves as an objective standard for selecting coarse-graining and macro-dynamics.claim0.825Claim by Hoel et al. and endorsed by this survey; used to counter subjectivity critiques.
- Ethical implication about the nature of AI training experience if the thesis holds
- Selective pressure toward convergence via task generality
- Training on cities+neg_cities improves OOD generalization, especially on neg_sp_en_transfinding0.753Training on statements and their negations mitigates non-truth feature interference in probe directions
- Equivalence of optimal predictor to the physics of the data.
- Setting αk to the maximum gradient norm performs best among tested strategies on NYUv2 (Figure 6).finding0.751Sensitivity analysis for gradient normalization scaling factor.
- Overarching three-part hypothesis stated in introduction