Path-Based Activation Intervention

The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions

Neighborhood — ranked by edge-count

concept

Manifold Steering
uses
Central framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Behavior-Optimized Activation Path Recoverymethod0.802
Method of optimizing activation-space interventions to produce behavioral paths along M_y, then measuring whether the resulting activation trajectories trace M_h curvature
Causal Intervention via Activation Shiftingmethod0.789
Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
Causal Intervention via Activation Shiftmethod0.787
Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
Intervention Propagationconcept0.775
Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
Subspace Interventionconcept0.771
Intervention targeting specific dimensional subsets of activation vectors rather than full representations
Behavior-based Pathconcept0.763
The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
Direction (activation space)concept0.760
A linear combination of neurons in a layer; the general form of a neural network feature including both individual neurons and other combinations
Activationsconcept0.754
Internal representations of the model on which probes operate; the method uses activations to rank datapoints.