finding
active
finding:for-small-cl-loss-weights-epsilon-iia-is-maintained-potentially-improved-while-emd-decreases-in-boundless-das-on-a-7b-llmFor small CL loss weights epsilon, IIA is maintained (potentially improved) while EMD decreases in Boundless DAS on a 7B LLM
Empirical result showing the CL loss can reduce divergence without sacrificing interpretability accuracy
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- The CL auxiliary loss can directly reduce representational divergence in practical interpretability settings without sacrificing interpretability method accuracyassociated_withsupportsCentral practical contribution: the CL loss offers a viable mitigation strategy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Modified CL loss produces EMD along feature dimensions of 0.007±0.001 on synthetic 10-class datasetfinding0.785Quantitative improvement in divergence reduction using the modified CL loss on synthetic dataset
- DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.784IIA baseline for DAS behavioral loss on synthetic dataset
- Modified CL loss achieves IIA of 0.9988±0.0005 on synthetic 10-class dataset training/test setsfinding0.780IIA for modified CL loss on synthetic dataset, comparable to behavioral DAS
- Modified CL loss outperforms behavioral DAS loss in OOD transfer from dense to sparse class partitionfinding0.775Key practical utility result: CL loss improves generalization of alignment to out-of-distribution settings
- DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class datasetfinding0.766Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
- Training stability analysis.
- DB-MTL with EMA forgetting rate β in a wide range performs better than without EMA (β=0) on Office-31.finding0.764Effect of EMA forgetting rate on performance.
- Proof-of-principle that MAS can detect model misalignment in DeepSeek-R1-Qwen-1.5B fine-tuned models.