concept
active
concept:interchange-intervention-accuracy-iiaInterchange Intervention Accuracy (IIA)
Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (2)
framework
- The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
- Practical method by Geiger et al. for finding distributed causal abstractions using gradient descent
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Metric measuring accuracy of DNN under intervention at matching algorithm-predicted outputs on held-out test set
- Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
- Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
- Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
- Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
- Empirical support for vacuousness of unrestricted causal abstraction
- The idea of using machines/systems to magnify human intellectual capability, early AI concept tied to Alexander's Notes.