concept
active
concept:activation-manifold-m-hactivation manifold M_h
Manifold fitted to representations in activation space.
Neighborhood — ranked by edge-count
Concepts (5)
concept
- Activation Manifoldrelated_tosame_asThe low-dimensional geometric structure discovered in neural activation space; contrasted with linear/Euclidean geometry.
- Manifold SteeringimplementsCentral framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.
- Activation spaceassociated_withRepresentation space on which linear probes operate to attribute harmful behaviors to training data.
- Neural Representation Geometryassociated_withThe broader conceptual framework that neural activations exhibit non-Euclidean geometric structure causally linked to behavior.
- behavior manifold M_yassociated_withManifold fitted to output probability distributions (behavior).
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Method to fit a manifold M_h to neural representations in activation space.
- A smooth, potentially curved surface in activation space along which activations vary according to a coherent semantic dimension.
- Method to fit a manifold M_y to output probability distributions.
- Central empirical result showing causal coupling between representation and behavior geometry across multiple substrates and modalities.
- Demonstrates bidirectional causal link: behavior manifold geometry can be recovered by optimizing in representation space.
- Technique used to fit M_h and M_y from data; enables manifold steering.