finding
active
finding:in-the-mountain-car-case-study-car-position-is-a-1d-manifold-linear-interventions-cross-voids-causing-incoherence-following-the-1d-curve-produces-smooth-controlIn the Mountain Car case study, car position is a 1D manifold; linear interventions cross voids causing incoherence; following the 1D curve produces smooth control.
Empirical demonstration that a semantically meaningful variable is encoded as a curved manifold, and that respecting its geometry is critical for effective intervention.
Source paper
extracted_from(2026) · Geiger, Atticus · Lubana, Ekdeep Singh · Fel, Thomas · Merullo, Jack +3
Neighborhood — ranked by edge-count
Claims (3)
claim
- Proposes that nonlinear geometric structure is superior to linear feature spaces for capturing semantic content.
- Mechanistic explanation: geometric structure emerges naturally from standard training on data with underlying structure.
- General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
Communities (2)
community
- Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
- Concepts encoded as curved manifolds and circular structures in LLM activation spaces.
Concepts (8)
concept
- Mountain Car Case StudycitesimplementsBenchmark demonstrating that car position encodes as a 1D manifold; linear interventions fail by crossing voids; following the geometric curve produces smooth control.
- The VoidcitesThe property that the most profound centers have at their heart a void like water, infinite in depth, surrounded by and contrasted with the clutter around it; the calm emptiness needed by every center to give it the basis of its strength
- Activation spacecitesRepresentation space on which linear probes operate to attribute harmful behaviors to training data.
- 1D manifoldcitesA single-continuous curve in activation space encoding a single variable, such as car position in the Mountain Car case.
- incoherencecitesNonsensical or unphysical model outputs that result when interventions cross voids in activation space.
- linear interventioncitesManipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
- smooth controlcitesCoherent, predictable changes in model behavior achieved by navigating along the learned manifold rather than using straight-line interventions.
- car positioncitesThe scalar variable representing the car's location in the Mountain Car reinforcement learning problem, found to be encoded as a 1D manifold.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Cross-modality result from the full paper demonstrating that representation-behavior geometry alignment is not limited to language models.
- Core empirical claim comparing steering approaches on cyclic concepts.
- The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
- Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
- The research gap that motivates manifold steering as an alternative to conventional linear approaches
- Extends the brutal geometry thesis beyond architecture into all creative and social domains; acknowledged as not yet confirmed with certainty