claim

active

claim:linear-steering-is-often-mismatched-with-a-model-s-internal-representation-geometry-producing-noisy-off-target-effects

Linear steering is often mismatched with a model's internal representation geometry, producing noisy, off-target effects.

The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.

Source paper

extracted_from

Steering Along Manifolds to Control Neural Networks

Neighborhood — ranked by edge-count

Papers (1)

paper

Steering Along Manifolds to Control Neural Networks
introduces

Findings (1)

finding

Linear steering on Llama-3.1 8B for the days-of-week task cuts across the behavior manifold, producing noisy off-target effects where predicted tokens are not even days of the week.
supports
Empirical result demonstrating the failure mode of linear steering when concept geometry is cyclic.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear steering produces noisy off-target effects; manifold steering cleanly shifts probability mass between sequential concepts.finding0.889
Core empirical claim comparing steering approaches on cyclic concepts.
Linear steering cuts through off-manifold regions and hence produces unnatural outputs.claim0.858
Attribution of failure to Euclidean assumption.
manifold steering produces clean probability shifts along natural behavior structure; linear steering cuts across manifold and produces off-target noisy effectsfinding0.849
Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
Gap: linear steering assumes Euclidean geometry and does not account for the actual curved geometry of activation manifoldsconcept0.841
The research gap that motivates manifold steering as an alternative to conventional linear approaches
linear steeringmethod0.822
Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
Steering along manifolds provides better control than linear steering when the concept geometry is non-linear.claim0.822
The central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
Steering vectors capture latent dimensions of reflective behavior more faithfully than surface-level embedding similarity.claim0.820
Supported by the instruction discovery experiments comparing steering vs. embedding baselines.
Some steering vectors produce more salient perturbations than others, perhaps based on shared semantic or qualitative factorsclaim0.808
Observation from 100% accuracy on specific concept-layer-strength combinations suggesting concept-specific detectability