hypothesis
active
hypothesis:we-hypothesize-that-interventions-that-respect-the-geometry-of-activation-space-will-yield-behaviors-close-to-those-the-model-exhibits-naturallyWe hypothesize that interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally
The core testable hypothesis driving the experimental design
Source paper
extracted_from(2026) · Daniel Wurgaft · Can Rager · Matthew Kowal · Vasudev Shyam +12
Neighborhood — ranked by edge-count
Findings (1)
finding
- Steering along M_h yields behavioral trajectories that follow M_y, producing more natural outputs than linear steeringassociated_withsupportsCore empirical result demonstrating the superiority of manifold steering over linear steering
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
- Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
- Does the geometric structure of activation space causally shape neural network behavior?question0.816Central research question driving the work.
- Neural representation geometry causally shapes behavior; interventions respecting that geometry will yield natural trajectories.hypothesis0.795Central hypothesis tested via manifold steering experiments across language models and video world models.
- General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
- Method that optimizes activation interventions so that resulting behaviors trace M_y, recovering activation paths that follow M_h curvature.
- Author’s interpretive claim that the shared geometry is general and robust.