finding
active
finding:linear-steering-on-llama-3-1-8b-for-the-days-of-week-task-cuts-across-the-behavior-manifold-producing-noisy-off-target-effects-where-predicted-tokens-are-not-even-days-of-the-weekLinear steering on Llama-3.1 8B for the days-of-week task cuts across the behavior manifold, producing noisy off-target effects where predicted tokens are not even days of the week.
Empirical result demonstrating the failure mode of linear steering when concept geometry is cyclic.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Papers (1)
paper
Claims (2)
claim
- Steering along manifolds provides better control than linear steering when the concept geometry is non-linear.associated_withsupportsThe central thesis of the paper, motivating the shift from linear to geometry-aware manifold steering.
- The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core empirical result demonstrating that manifold steering produces on-target, behavior-aligned outputs.
- Core empirical claim comparing steering approaches on cyclic concepts.
- Empirical observation establishing that Llama's behavior for days-of-week tasks has circular structure.
- Empirical demonstration on Llama-3.1-8B that steering along representation manifold aligns outputs with behavior manifold, whereas linear steering does not.
- Illustrative finding that ESR mitigates but does not fully eliminate steering influence
- Attribution of failure to Euclidean assumption.
- Empirical observation establishing that Llama's internal representations for days-of-week have circular geometric structure.
- Model-specific difference in persona susceptibility