framework
active
framework:geometry-aware-steering-frameworkGeometry-Aware Steering Framework
The overarching theoretical framework proposed in the paper, asserting that steering interventions should be aligned with the geometric structure of the model's representation manifold.
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (3)
concept
- Manifold SteeringimplementsCentral framework: steering neural networks by intervening along the curved manifold where a concept lives, rather than in straight lines through activation space.
- One-dimensional curved surface in output probability space; the paper shows this mirrors representation manifold structure.
- One-dimensional curved surface in internal activation space; the paper demonstrates alignment with behavior manifold.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Paradigm of finding the right geometry (manifold) for principled control.
- Conceptual scheme introduced in this paper: neural networks develop internal geometric representations that mirror real-world geometry, providing the right level of description for interpretability and control.
- The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts
- Linear steering implicitly assumes a flat, Euclidean activation space, leading to off-manifold excursions.
- The main framework proposed for retrieving and steering high-order semantic features in LLMs via sparse autoencoders.
- Paradigm of finding the right direction in activation space (e.g., linear steering).