method
active
method:1d-distributed-interchange-intervention-1d-dii1D Distributed Interchange Intervention (1D DII)
Core intervention method used throughout CausalGym; operates on one-dimensional non-basis-aligned subspace of activation space
Neighborhood — ranked by edge-count
Frameworks (2)
framework
- Linear Representation HypothesisimplementsThe hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
- CausalGymusesMulti-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Methods (1)
method
- Distributed Interchange Interventionrelated_toExtends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Fundamental operation for causal abstraction analysis; forces neurons to take values from source inputs to create counterfactuals.
- Existing approach to standardized electronic communication between organizations; Elephant aimed at improving beyond fixed formats like X12.
- Theoretical justification for the methodological choice of 1D DII throughout the benchmark
- Training technique that induces specific causal structures in neural networks by co-training with interchange interventions
- Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
- Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
- Differentiable training objective minimized when a high-level model is an abstraction of a neural network under a given alignment.
- Concept of self as extended and co-constituted by interactions, per Mahāyāna.