question
active
question:what-if-the-concept-being-manipulated-does-not-lie-on-a-straight-line-in-the-model-s-representationsWhat if the concept being manipulated does not lie on a straight line in the model's representations?
The motivating question that opens the paper and leads to the development of manifold steering.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Key methodological insight: introspection enables a new probe validation criterion beyond conventional separation metrics
- Third core research question motivating the CL loss approach in Section 5
- Key interpretive claim that deception has a tractable geometric signature in activation space
- The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.
- Interpretive synthesis of DIM and cone intervention successes
- Author’s interpretive claim that the shared geometry is general and robust.
- The core motivating question of the paper, framed by Christiano et al. (2021)