method
active
method:vanilla-interchange-interventionVanilla interchange intervention
Full n-dimensional activation replacement; most expressive intervention tested, used as upper bound in appendix
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
- Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
- Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
- Intervention mode where interventions are applied sequentially, each building on the previous one
- Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations