concept
active
concept:geometry-based-steeringgeometry-based steering
Paradigm of finding the right geometry (manifold) for principled control.
Neighborhood — ranked by edge-count
Concepts (2)
concept
- direction-based steeringrelated_toParadigm of finding the right direction in activation space (e.g., linear steering).
- Manifold SteeringimplementsCentral framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The overarching theoretical framework proposed in the paper, asserting that steering interventions should be aligned with the geometric structure of the model's representation manifold.
- Linear steering implicitly assumes a flat, Euclidean activation space, leading to off-manifold excursions.
- Novel method that applies intervention only when the model begins a new thinking step (at the \n\n delimiter) rather than at every token
- A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
- Ability to steer model behavior in two opposite semantic directions on a trait.
- The actual shapes and spatial relationships of buildings, essential to living structure.
- Extension of manifold steering validation to video world models and physical dynamics tasks, demonstrating cross-modal generality
- General technique of modifying activations to control model behavior.