framework
active
framework:feature-manifolds

Feature Manifolds

Hypothesized extension of superposition where features may be higher-dimensional manifolds rather than 1D directions

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Prior model of superposition where features are discrete 1D objects repelling each other roughly evenly; paper argues this is incomplete

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • manifoldconcept0.821
    A smooth, potentially curved surface in activation space along which activations vary according to a coherent semantic dimension.
  • One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
  • Extension of superposition hypothesis to account for continuous families of features
  • An interpretability approach that describes representations in terms of entire curved manifolds rather than many small features.
  • Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
  • manifold learningframework0.771
    Technique used to fit M_h and M_y from data; enables manifold steering.
  • curved manifoldconcept0.762
    A smoothly varying lower-dimensional surface in activation space that captures a concept better than a straight linear direction.
  • The type of manifold fitted to the cyclic concept structure in both activation and behavior space — a path along which steering moves the model.