hypothesis
active
hypothesis:we-hypothesized-that-divergence-could-influence-iia-when-transferring-the-das-alignment-to-ood-settingsWe hypothesized that divergence could influence IIA when transferring the DAS alignment to OOD settings
Motivating hypothesis for the OOD experiment testing practical utility of divergence reduction
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Findings (2)
finding
- Linear regression of OOD IIA on training EMD yields coefficient -0.3424, R^2=0.729, F(1,28)=75.28, p<.001associated_withsupportsStatistical evidence that training divergence (EMD) predicts lower OOD intervention performance
- Key practical utility result: CL loss improves generalization of alignment to out-of-distribution settings
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Authors connect their finding to the prior probing literature debate
- Second central claim of the paper.
- Empirical support for vacuousness of unrestricted causal abstraction
- Future work hypothesis about extending SOO to direct value alignment
- Replication of Wu et al. 2023 finding; DAS expressivity concern validated in CausalGym setup
- Authors identify this as the most uncertain and important question for future work
- Practical utility of reducing divergence demonstrated through regression analysis
- Central claim motivating DAS over prior methods.