finding
active
finding:stepwise-steering-achieves-over-5-accuracy-improvement-compared-to-all-token-intervention-at-similar-token-budgetStepwise steering achieves over 5% accuracy improvement compared to all-token intervention at similar token budget
Key result demonstrating advantage of stepwise over all-token steering strategy
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Neighborhood — ranked by edge-count
Claims (1)
claim
- Comparative claim between the two steering strategies
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key asymmetry finding: suppressing reflection is easier than inducing it.
- Core validation that identified latent directions correspond to meaningful control over reflective behavior.
- Robustness check on token choice for binary classification
- Empirical comparison showing advantage of SAE features in low-data regime.
- Shows that activation steering does not fully replicate mechanisms triggered by explicit prompting.
- Maximum token savings achieved by ReflCtrl on non-mathematical general reasoning tasks
- Activation steering works on SDF-only model organism (before expert iteration) with steering strength 0.4finding0.761Replicates main result on simpler model; qualitatively similar patterns.
- Practical finding for optimizing steering setup.
Restated by (1)
cosine ≥ 0.90Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.