Pullback Steering

The method of optimizing steering interventions in activation space to produce outputs that follow the behavior manifold, independent of the representation manifold.

Neighborhood — ranked by edge-count

Papers (1)

paper

Steering Along Manifolds to Control Neural Networks
introduces

Concepts (2)

concept

behavior manifold
uses
One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
Behavior-based Path
introduces
The path in activation space derived by optimizing steering interventions to produce outputs along the behavior manifold, independent of representation geometry.

Conceptual bridges

2-hop · via this method's ideas

Where ideas in this method connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.

Behavior-based Path
~Representation-based Path· ai

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Model Steeringconcept0.776
Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time
direction-based steeringconcept0.770
Paradigm of finding the right direction in activation space (e.g., linear steering).
Activation Steeringmethod0.768
Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
linear steeringmethod0.761
Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
Pullback Geometry (Behavior-Aware Metric)concept0.760
steering vectorsconcept0.757
A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
Bidirectional Steeringconcept0.754
Ability to steer model behavior in two opposite semantic directions on a trait.
Concept Steeringmethod0.753
Latent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.