claim

active

claim:representational-divergence-as-measured-by-emd-can-predict-lower-out-of-distribution-intervention-performance

Representational divergence (as measured by EMD) can predict lower out-of-distribution intervention performance

Practical utility of reducing divergence demonstrated through regression analysis

Source paper

extracted_from

Addressing divergent representations from causal interventions on neural networks

(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts

Neighborhood — ranked by edge-count

Findings (1)

finding

Linear regression of OOD IIA on training EMD yields coefficient -0.3424, R^2=0.729, F(1,28)=75.28, p<.001
associated_withsupports
Statistical evidence that training divergence (EMD) predicts lower OOD intervention performance

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representational Divergenceconcept0.799
Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution
Divergent representations are a common, if not likely, outcome of causal interventions across a wide range of methodsclaim0.780
Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
Trainable intervention (DAS) finds sparser gender representations than linear probing, suggesting probing overestimates causal coverageclaim0.763
Interpretive claim from Case Study II about the distinction between correlational probes and causal interventions
DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class datasetfinding0.755
Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
EI and normalized EI could serve as a unified metric for out-of-distribution generalization.claim0.748
Conjecture that maximizing EI yields causal representations invariant to distribution shifts.
If EI maximization is used as a regularization in representation learning, then OOD generalization will improve beyond current invariant risk minimization methods.hypothesis0.743
Proposed conjecture in §4.3.1.
An intervention benign at context v4<0.75 produces a class-C behavioral flip at 0.75<v4<1, demonstrating dormant behavioral changes from latent divergencefinding0.742
Synthetic example showing an intervention that appears safe in tested contexts but causes behavior changes in others
Representation engineering successfully quantifies deception via high-accuracy steering vectors, establishing it as a measurable property of model representationsclaim0.742
Key interpretive claim that deception has a tractable geometric signature in activation space