Our method enables bidirectional steering of model behavior.

The method can steer the model in both positive and negative directions on the target semantic.

Source paper

extracted_from

Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders

(2026) · Ruikang Zhang · Shuo Wang · Q. Su

Neighborhood — ranked by edge-count

Papers (1)

paper

Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders
mentions

Claims (1)

claim

Our findings provide a novel, robust mechanistic path for the regulation of complex AI behaviors.
supports
Interpretation that the work opens a new avenue for controlling complex AI.

Communities (3)

community

Manifold-aware concept steering in neural representations
members_of
Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
Geometric concept representations in neural networks
members_of
Concepts encoded as curved manifolds and circular structures in LLM activation spaces.
Concept geometry and steering in neural networks
members_of
Using geometric structure of learned representations to interpret and control model behavior through concept operators.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Bidirectional Steeringconcept0.844
Ability to steer model behavior in two opposite semantic directions on a trait.
Manifold steering demonstrates bidirectional geometry-behavior link in a video world model on tasks with geometry corresponding to physical dynamicsfinding0.834
Extension of manifold steering validation to video world models and physical dynamics tasks, demonstrating cross-modal generality
Optimally steering model behavior requires isolating concept geometry and defining operators to navigate it.claim0.833
Linear steering is often mismatched with a model's internal representation geometry, producing noisy, off-target effects.claim0.801
The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
Model Steeringconcept0.791
Using interventions to guide model generation behavior, e.g., adding sentiment vectors at inference time
There is a clear bidirectional relationship between the geometry of behavior and representation: steering along representation manifolds follows behavior manifolds, and vice versa.claim0.791
The paper's finding that the alignment holds in both directions — from representation to behavior and from behavior back to representation space.
Activation steering can make an evaluation-aware model act as if deployed, not merely suppress verbalizations of evaluation awarenessclaim0.790
Central claim of the paper; supported by the model organism ground-truth approach.
How can we be sure that steering methods actually elicited the deployment behavior, as opposed to only suppressing verbalizations of being deployed?question0.782
Central motivating question of the paper; the model organism approach is the proposed answer.