method
active
method:pullback-steering

Pullback Steering

The method of optimizing steering interventions in activation space to produce outputs that follow the behavior manifold, independent of the representation manifold.

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
  • The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.

Conceptual bridges

2-hop · via this method's ideas

Where ideas in this method connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Model Steeringconcept0.776
    Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time
  • Paradigm of finding the right direction in activation space (e.g., linear steering).
  • Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
  • linear steeringmethod0.761
    Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
  • steering vectorsconcept0.757
    A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
  • Ability to steer model behavior in two opposite semantic directions on a trait.
  • Concept Steeringmethod0.753
    Latent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.