concept
active
concept:subspace-interventionSubspace Intervention
Intervention targeting specific dimensional subsets of activation vectors rather than full representations
Neighborhood — ranked by edge-count
Methods (1)
method
- The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.
Concepts (1)
concept
- Neural Network InterventionextendsThe fundamental operation of making in-place changes to model activations, placing the model in a counterfactual state
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Intervention mode where interventions are applied sequentially, each building on the previous one
- pyvene's approach of storing interventions as shareable serialized objects rather than runtime code
- Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
- Additional synthetic example of pernicious divergence from balanced subspaces
- Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
- The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions