finding
active
finding:das-behavioral-loss-produces-emd-along-feature-dimensions-of-0-032-0-003-on-synthetic-10-class-datasetDAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class dataset
Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Findings (1)
finding
- Modified CL loss produces EMD along feature dimensions of 0.007±0.001 on synthetic 10-class datasetsupportsQuantitative improvement in divergence reduction using the modified CL loss on synthetic dataset
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.854IIA baseline for DAS behavioral loss on synthetic dataset
- Empirical result showing the CL loss can reduce divergence without sacrificing interpretability accuracy
- Modified CL loss outperforms behavioral DAS loss in OOD transfer from dense to sparse class partitionfinding0.761Key practical utility result: CL loss improves generalization of alignment to out-of-distribution settings
- Practical utility of reducing divergence demonstrated through regression analysis
- Empirical demonstration that DAS interventions produce divergent representations
- Linear regression of OOD IIA on training EMD yields coefficient -0.3424, R^2=0.729, F(1,28)=75.28, p<.001finding0.745Statistical evidence that training divergence (EMD) predicts lower OOD intervention performance
- GRU behavior can be compressed to as few as 4 dimensions using DAS and MAS with comparable IIAsfinding0.742Shows that behaviorally relevant information is low-dimensional; contrasted with model stitching achieving near-perfect IIA at rank 2.
- Empirical demonstration that MDVP produces divergent representations in a real LLM