concept
active
concept:activation-manifold-m-h

activation manifold M_h

Manifold fitted to representations in activation space.

Neighborhood — ranked by edge-count

Concepts (5)

concept
  • Activation Manifold
    related_tosame_as
    The low-dimensional geometric structure discovered in neural activation space; contrasted with linear/Euclidean geometry.
  • Central framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.
  • Activation space
    associated_with
    Representation space on which linear probes operate to attribute harmful behaviors to training data.
  • The broader conceptual framework that neural activations exhibit non-Euclidean geometric structure causally linked to behavior.
  • behavior manifold M_y
    associated_with
    Manifold fitted to output probability distributions (behavior).

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.