method
active
method:1d-distributed-interchange-intervention-1d-dii

1D Distributed Interchange Intervention (1D DII)

Core intervention method used throughout CausalGym; operates on one-dimensional non-basis-aligned subspace of activation space

Neighborhood — ranked by edge-count

Frameworks (2)

framework
  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
  • Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym

Methods (1)

method
  • Extends interchange interventions to non-standard bases by rotating representations, intervening in rotated subspaces, then rotating back.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.