finding

active

finding:in-the-mountain-car-case-study-car-position-is-a-1d-manifold-linear-interventions-cross-voids-causing-incoherence-following-the-1d-curve-produces-smooth-control

In the Mountain Car case study, car position is a 1D manifold; linear interventions cross voids causing incoherence; following the 1D curve produces smooth control.

Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.

Source paper

extracted_from

The World Inside Neural Networks

(2026) · Geiger, Atticus · Lubana, Ekdeep Singh · Fel, Thomas · Merullo, Jack +3

Neighborhood — ranked by edge-count

Claims (3)

claim

Curved manifolds often represent concepts better than linear directions.
supports
Proposes that nonlinear geometric structure is superior to linear feature spaces for capturing semantic content.
Geometry arises from optimization pressure on networks trained on structured data.
supports
Mechanistic explanation: geometric structure emerges naturally from standard training on data with underlying structure.
Linear interventions across voids in activation space produce incoherent output, while following the manifold curve produces smooth control.
supports
General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.

Communities (2)

community

Manifold-aware concept steering in neural representations
members_of
Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
Geometric concept representations in neural networks
members_of
Concepts encoded as curved manifolds and circular structures in LLM activation spaces.

Concepts (8)

concept

Mountain Car Case Study
citesimplements
Benchmark demonstrating that car position encodes as a 1D manifold; linear interventions fail by crossing voids; following the geometric curve produces smooth control.
The Void
cites
The property that the most profound centers have at their heart a void like water, infinite in depth, surrounded by and contrasted with the clutter around it; the calm emptiness needed by every center to give it the basis of its strength
Activation space
cites
Representation space on which linear probes operate to attribute harmful behaviors to training data.
1D manifold
cites
A single-continuous curve in activation space encoding a single variable, such as car position in the Mountain Car case.
incoherence
cites
Nonsensical or unphysical model outputs that result when interventions cross voids in activation space.
linear intervention
cites
Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
smooth control
cites
Coherent, predictable changes in model behavior achieved by navigating along the learned manifold rather than using straight-line interventions.
car position
cites
The scalar variable representing the car's location in the Mountain Car reinforcement learning problem, found to be encoded as a 1D manifold.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Manifold geometry provides a practical steering blueprint in an image-action model predicting car position on a hill, extending results across modalities.finding0.800
Cross-modality result from the full paper demonstrating that representation-behavior geometry alignment is not limited to language models.
Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.finding0.780
Core empirical claim comparing steering approaches on cyclic concepts.
Steering along manifolds provides better control than linear steering when the concept geometry is non-linear.claim0.774
The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
Manifold-respecting steering produces smooth natural behavioral trajectories while linear steering teleports between non-adjacent concepts.claim0.765
manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsfinding0.763
Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
If manifold-steering prototype shows positive results, About Blank will commit to Company shape (not Lab) at 12-month fork decision.prediction0.759
Gap: linear steering assumes Euclidean geometry and does not account for the actual curved geometry of activation manifoldsconcept0.759
The research gap that motivates manifold steering as an alternative to conventional linear approaches
We hypothesize that a similar 'brutal' and purely geometric process always occurs somewhere in other kinds of unfolding that generate living order — in poetry, dance, social structure, planning, and family relationships.hypothesis0.752
Extends the brutal geometry thesis beyond architecture into all creative and social domains; acknowledged as not yet confirmed with certainty