finding

active

finding:steering-along-m-h-yields-behavioral-trajectories-that-follow-m-y-producing-more-natural-outputs-than-linear-steering

Steering along M_h yields behavioral trajectories that follow M_y, producing more natural outputs than linear steering

Core empirical result demonstrating the superiority of manifold steering over linear steering

Source paper

extracted_from

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

(2026) · Daniel Wurgaft · Can Rager · Matthew Kowal · Vasudev Shyam +12

Neighborhood — ranked by edge-count

Claims (3)

claim

Geometric structure of neural representations causally shapes model behavior
supports
The paper's core causal assertion: geometry is not incidental but mechanistically linked to behavior
Linear steering cuts through off-manifold regions and hence produces unnatural outputs.
supports
Attribution of failure to Euclidean assumption.
There exists a bidirectional relationship between the geometry of neural representation and the geometry of model behavior
supports
Central empirical claim of the paper, demonstrated across tasks and modalities

Hypotheses (1)

hypothesis

We hypothesize that interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally
associated_withsupports
The core testable hypothesis driving the experimental design

Questions (1)

question

Does the geometric structure of neural representations causally shape model behavior?
answered_by
The motivating research question of the paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Manifold-respecting steering produces smooth natural behavioral trajectories while linear steering teleports between non-adjacent concepts.claim0.815
manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsfinding0.809
Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
Linear steering is often mismatched with a model's internal representation geometry, producing noisy, off-target effects.claim0.807
The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.finding0.803
Core empirical claim comparing steering approaches on cyclic concepts.
Steering along manifolds provides better control than linear steering when the concept geometry is non-linear.claim0.797
The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
Interventions along activation manifold M_h yield behavioral trajectories following behavior manifold M_y, and vice versa — bidirectional relationship demonstrated across language models and video world models.finding0.794
Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
Some steering vectors produce more salient perturbations than others, perhaps based on shared semantic or qualitative factorsclaim0.779
Observation from 100% accuracy on specific concept-layer-strength combinations suggesting concept-specific detectability
Manifold steering demonstrates bidirectional geometry-behavior link in a video world model on tasks with geometry corresponding to physical dynamicsfinding0.773
Extension of manifold steering validation to video world models and physical dynamics tasks, demonstrating cross-modal generality