finding
active
finding:our-method-enables-bidirectional-steering-of-model-behaviorOur method enables bidirectional steering of model behavior.
The method can steer the model in both positive and negative directions on the target semantic.
Source paper
extracted_from(2026) · Ruikang Zhang · Shuo Wang · Q. Su
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Our findings provide a novel, robust mechanistic path for the regulation of complex AI behaviors.supportsInterpretation that the work opens a new avenue for controlling complex AI.
Communities (3)
community
- Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
- Concepts encoded as curved manifolds and circular structures in LLM activation spaces.
- Using geometric structure of learned representations to interpret and control model behavior through concept operators.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Ability to steer model behavior in two opposite semantic directions on a trait.
- Extension of manifold steering validation to video world models and physical dynamics tasks, demonstrating cross-modal generality
- The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
- Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time
- The paper's finding that the alignment holds in both directions — from representation to behavior and from behavior back to representation space.
- Central claim of the paper; supported by the model organism ground-truth approach.
- Central motivating question of the paper; the model organism approach is the proposed answer.