Interchange Intervention Accuracy (IIA) Metric

Metric measuring accuracy of DNN under intervention at matching algorithm-predicted outputs on held-out test set

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interchange Intervention Accuracy (IIA)concept0.942
Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior
Interchange Intervention Accuracymethod0.844
Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
Interchange Intervention Training (IIT)method0.811
Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
Interchange Intervention Training Objectivemethod0.772
Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
Interchange Interventionmethod0.760
Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
Distributed Interchange Interventionmethod0.748
Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
What is the appropriate metric for measuring representational alignment, given active debate on merits and deficiencies of all proposed measures?question0.743
Open methodological question acknowledged as limitation
Near-perfect IIA can be achieved on randomly initialised models that cannot solve the task, suggesting causal alignment does not require task capabilityclaim0.738
Empirical support for vacuousness of unrestricted causal abstraction