framework
active
framework:modified-cl-lossModified CL Loss
Novel variant of CL loss introduced in this paper targeting only causal subspace dimensions to improve OOD performance
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Frameworks (1)
framework
- Auxiliary training objective from Grant (2025) that constrains intervened representations to remain near natural distribution
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Explicitly identified limitation of the proposed mitigation method
- Auxiliary objective combining L2 and cosine losses against pre-recorded CL vectors to improve causal relevance when one model is causally inaccessible.
- Modified CL loss achieves IIA of 0.9988±0.0005 on synthetic 10-class dataset training/test setsfinding0.758IIA for modified CL loss on synthetic dataset, comparable to behavioral DAS
- Modified CL loss outperforms behavioral DAS loss in OOD transfer from dense to sparse class partitionfinding0.750Key practical utility result: CL loss improves generalization of alignment to out-of-distribution settings
- Modified CL loss produces EMD along feature dimensions of 0.007±0.001 on synthetic 10-class datasetfinding0.738Quantitative improvement in divergence reduction using the modified CL loss on synthetic dataset
- Regularization component of the composite loss that penalizes deviation from baseline model behavior on Alpaca instructions
- Loss computed using continuous relaxations of logic gates during training
- In machine learning, a function measuring the distance between current and desired output; analogous to stress.