finding

active

finding:modified-cl-loss-produces-emd-along-feature-dimensions-of-0-007-0-001-on-synthetic-10-class-dataset

Modified CL loss produces EMD along feature dimensions of 0.007±0.001 on synthetic 10-class dataset

Quantitative improvement in divergence reduction using the modified CL loss on synthetic dataset

Source paper

extracted_from

Addressing divergent representations from causal interventions on neural networks

(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts

Neighborhood — ranked by edge-count

Findings (1)

finding

DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class dataset
supports
Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Modified CL loss achieves IIA of 0.9988±0.0005 on synthetic 10-class dataset training/test setsfinding0.823
IIA for modified CL loss on synthetic dataset, comparable to behavioral DAS
For small CL loss weights epsilon, IIA is maintained (potentially improved) while EMD decreases in Boundless DAS on a 7B LLMfinding0.785
Empirical result showing the CL loss can reduce divergence without sacrificing interpretability accuracy
Modified CL loss outperforms behavioral DAS loss in OOD transfer from dense to sparse class partitionfinding0.775
Key practical utility result: CL loss improves generalization of alignment to out-of-distribution settings
The modified CL loss is confined to a narrow set of simplistic settings and is not specific to pernicious divergenceconcept0.755
Explicitly identified limitation of the proposed mitigation method
Linear regression of OOD IIA on training EMD yields coefficient -0.3424, R^2=0.729, F(1,28)=75.28, p<.001finding0.739
Statistical evidence that training divergence (EMD) predicts lower OOD intervention performance
Modified CL Lossframework0.738
Novel variant of CL loss introduced in this paper targeting only causal subspace dimensions to improve OOD performance
Mean difference patching on Llama-3-8B layer 10 produces intervened EMD exceeding the natural-natural baselinefinding0.735
Empirical demonstration that MDVP produces divergent representations in a real LLM
DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.727
IIA baseline for DAS behavioral loss on synthetic dataset