Subspace Intervention

Intervention targeting specific dimensional subsets of activation vectors rather than full representations

Neighborhood — ranked by edge-count

method

Distributed Alignment Search
uses
The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

concept

Neural Network Intervention
extends
The fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Serial Interventionconcept0.799
Intervention mode where interventions are applied sequentially, each building on the previous one
Serializable Interventionconcept0.799
pyvene's approach of storing interventions as shareable serialized objects rather than runtime code
Subspace DASmethod0.798
Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
Intervention Propagationconcept0.794
Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
Emotion Subspaceconcept0.791
The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
Intervention on a balanced subspace dimension while holding others fixed crosses the decision boundary using a non-native mechanismfinding0.782
Additional synthetic example of pernicious divergence from balanced subspaces
Parallel Interventionconcept0.775
Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
Path-Based Activation Interventionmethod0.771
The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions