hypothesis

active

hypothesis:we-hypothesize-that-interventions-that-respect-the-geometry-of-activation-space-will-yield-behaviors-close-to-those-the-model-exhibits-naturally

We hypothesize that interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally

The core testable hypothesis driving the experimental design

Source paper

extracted_from

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

(2026) · Daniel Wurgaft · Can Rager · Matthew Kowal · Vasudev Shyam +12

Neighborhood — ranked by edge-count

Findings (1)

finding

Steering along M_h yields behavioral trajectories that follow M_y, producing more natural outputs than linear steering
associated_withsupports
Core empirical result demonstrating the superiority of manifold steering over linear steering

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Interventions along activation manifold M_h yield behavioral trajectories following behavior manifold M_y, and vice versa — bidirectional relationship demonstrated across language models and video world models.finding0.838
Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
Optimizing interventions in activation space to produce paths along M_y recovers activation trajectories that trace the curvature of M_h.finding0.828
Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
Representation geometry causally shapes behavior; activation and behavior manifolds are approximately isometric.claim0.827
Does the geometric structure of activation space causally shape neural network behavior?question0.816
Central research question driving the work.
Neural representation geometry causally shapes behavior; interventions respecting that geometry will yield natural trajectories.hypothesis0.795
Central hypothesis tested via manifold steering experiments across language models and video world models.
Linear interventions across voids in activation space produce incoherent output, while following the manifold curve produces smooth control.claim0.792
General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
optimization of interventions to follow behavior manifold M_ymethod0.786
Method that optimizes activation interventions so that resulting behaviors trace M_y, recovering activation paths that follow M_h curvature.
There is a bidirectional relationship between the geometry of representation and behavior across tasks and modalities.claim0.778
Author’s interpretive claim that the shared geometry is general and robust.