method
active
method:path-based-activation-intervention

Path-Based Activation Intervention

The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Central framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Method of optimizing activation-space interventions to produce behavioral paths along M_y, then measuring whether the resulting activation trajectories trace M_h curvature
  • Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs
  • Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
  • Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
  • Intervention targeting specific dimensional subsets of activation vectors rather than full representations
  • The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.
  • A linear combination of neurons in a layer; the general form of a neural network feature including both individual neurons and other combinations
  • Activationsconcept0.754
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.