method
active
method:distributed-interchange-intervention

Distributed Interchange Intervention

Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.

Neighborhood — ranked by edge-count

Concepts (1)

concept

Methods (3)

method
  • Interchange Intervention
    extendsrelated_to
    Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
  • Core intervention method used throughout CausalGym; operates on one-dimensional non-basis-aligned subspace of activation space
  • The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
  • Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
  • Full n-dimensional activation replacement; most expressive intervention tested, used as upper bound in appendix
  • Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
  • Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
  • Evaluation metric measuring how well a trained intervention matches desired counterfactual model behavior
  • Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
  • Intervention targeting specific dimensional subsets of activation vectors rather than full representations