Interchange Intervention Accuracy

Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.

Neighborhood — ranked by edge-count

paper

concept

Approximate Causal Abstraction
implements
Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.

method

Interchange Intervention
related_to
Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interchange Intervention Accuracy (IIA)concept0.871
Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior
Interchange Intervention Accuracy (IIA) Metricmethod0.844
Metric measuring accuracy of DNN under intervention at matching algorithm-predicted outputs on held-out test set
Interchange Intervention Training Objectivemethod0.830
Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
Distributed Interchange Interventionmethod0.800
Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
Vanilla interchange interventionmethod0.772
Full n-dimensional activation replacement; most expressive intervention tested, used as upper bound in appendix
Interchange Intervention Training (IIT)method0.768
Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
Intervention Sizeconcept0.742
Number of latent variables assigned per algorithm node in distributed abstraction; affects IIA
Exact-Match Accuracy with Flexible Number Extractionmethod0.739
Evaluation metric: proportion of samples with predicted answer exactly matching ground-truth, with flexible number extraction.