concept
active
concept:manifold

manifold

A smooth, potentially curved surface in activation space along which activations vary according to a coherent semantic dimension.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Conceptual scheme introduced in this paper: neural networks develop internal geometric representations that mirror real-world geometry, providing the right level of description for interpretability and control.

Claims (2)

claim

Concepts (1)

concept
  • curved manifold
    related_to
    A smoothly varying lower-dimensional surface in activation space that captures a concept better than a straight linear direction.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The type of manifold fitted to the cyclic concept structure in both activation and behavior space — a path along which steering moves the model.
  • manifold learningframework0.845
    Technique used to fit M_h and M_y from data; enables manifold steering.
  • An interpretability approach that describes representations in terms of entire curved manifolds rather than many small features.
  • Feature Manifoldsframework0.821
    Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions
  • behavior manifoldconcept0.799
    One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
  • 1D manifoldconcept0.795
    A single-continuous curve in activation space encoding a single variable, such as car position in the Mountain Car case.
  • One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
  • The low-dimensional geometric structure discovered in neural activation space; contrasted with linear/Euclidean geometry.