claim
active
claim:linear-steering-is-often-mismatched-with-a-model-s-internal-representation-geometry-producing-noisy-off-target-effectsLinear steering is often mismatched with a model's internal representation geometry, producing noisy, off-target effects.
The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Empirical result demonstrating the failure mode of linear steering when concept geometry is cyclic.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core empirical claim comparing steering approaches on cyclic concepts.
- Attribution of failure to Euclidean assumption.
- Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
- The research gap that motivates manifold steering as an alternative to conventional linear approaches
- Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
- The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
- Supported by the instruction discovery experiments comparing steering vs. embedding baselines.
- Observation from 100% accuracy on specific concept-layer-strength combinations suggesting concept-specific detectability