Interchange Intervention Accuracy (IIA)

Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior

Neighborhood — ranked by edge-count

paper

framework

Model Alignment Search (MAS)
uses
The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
Distributed Alignment Search (DAS)
uses
Practical method by Geiger et al. for finding distributed causal abstractions using gradient descent

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interchange Intervention Accuracy (IIA) Metricmethod0.942
Metric measuring accuracy of DNN under intervention at matching algorithm-predicted outputs on held-out test set
Interchange Intervention Accuracymethod0.871
Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
Interchange Intervention Training (IIT)method0.834
Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
Interchange Interventionmethod0.792
Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
Interchange Intervention Training Objectivemethod0.787
Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
Distributed Interchange Interventionmethod0.762
Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
Near-perfect IIA can be achieved on randomly initialised models that cannot solve the task, suggesting causal alignment does not require task capabilityclaim0.737
Empirical support for vacuousness of unrestricted causal abstraction
Intelligence Amplification (IA)concept0.736
The idea of using machines/systems to magnify human intellectual capability, early AI concept tied to Alexander's Notes.