concept
active
concept:manifoldmanifold
A smooth, potentially curved surface in activation space along which activations vary according to a coherent semantic dimension.
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Conceptual scheme introduced in this paper: neural networks develop internal geometric representations that mirror real-world geometry, providing the right level of description for interpretability and control.
Claims (2)
claim
- Core critique of sparse autoencoders: they break the geometric structure of representations, making it harder to see the big picture.
- General principle derived from the Mountain Car experiment: curved manifold-following yields coherent manipulation, linear shortcuts fail.
Concepts (1)
concept
- curved manifoldrelated_toA smoothly varying lower-dimensional surface in activation space that captures a concept better than a straight linear direction.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The type of manifold fitted to the cyclic concept structure in both activation and behavior space — a path along which steering moves the model.
- Technique used to fit M_h and M_y from data; enables manifold steering.
- An interpretability approach that describes representations in terms of entire curved manifolds rather than many small features.
- Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions
- One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
- A single-continuous curve in activation space encoding a single variable, such as car position in the Mountain Car case.
- One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
- The low-dimensional geometric structure discovered in neural activation space; contrasted with linear/Euclidean geometry.