method
active
method:log-odds-ratioLog odds-ratio
Primary evaluation metric measuring causal effect of interventions; greater value indicates larger causal effect
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- CausalGymusesMulti-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Model evidence approximated by free energy bound, used for model selection.
- Parameter-free loss transformation applied to each task loss to equalize scales
- Mathematical equivalence showing logarithm transformation recovers IMTL-L in the limit
- The outer edge space that protects the text block but can be invaded by marginalia.
- Experimental variable sweeping proportion of self-correction to normal response training examples from 10% to 90%
- Measure of expected sensory input, core to linking value and surprise.
- Justification for the linear combination
- The goal of making model behavior match human values and intentions, often addressed during post-training.