claim
active
claim:representational-divergence-as-measured-by-emd-can-predict-lower-out-of-distribution-intervention-performanceRepresentational divergence (as measured by EMD) can predict lower out-of-distribution intervention performance
Practical utility of reducing divergence demonstrated through regression analysis
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Findings (1)
finding
- Linear regression of OOD IIA on training EMD yields coefficient -0.3424, R^2=0.729, F(1,28)=75.28, p<.001associated_withsupportsStatistical evidence that training divergence (EMD) predicts lower OOD intervention performance
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
- Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
- Interpretive claim from Case Study II about the distinction between correlational probes and causal interventions
- DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class datasetfinding0.755Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
- EI and normalized EI could serve as a unified metric for out-of-distribution generalization.claim0.748Conjecture that maximizing EI yields causal representations invariant to distribution shifts.
- Proposed conjecture in §4.3.1.
- Synthetic example showing an intervention that appears safe in tested contexts but causes behavior changes in others
- Key interpretive claim that deception has a tractable geometric signature in activation space