method
active
method:path-based-activation-interventionPath-Based Activation Intervention
The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Central framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method of optimizing activation-space interventions to produce behavioral paths along M_y, then measuring whether the resulting activation trajectories trace M_h curvature
- Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
- Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations
- The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
- A linear combination of neurons in a layer; the general form of a neural network feature including both individual neurons and other combinations
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.