claim
active
claim:distributing-steering-strength-across-multiple-layers-6-layers-at-0-6-each-is-more-effective-and-less-accuracy-damaging-than-concentrating-the-same-total-strength-in-one-layerDistributing steering strength across multiple layers (6 layers at 0.6 each) is more effective and less accuracy-damaging than concentrating the same total strength in one layer
Practical finding for optimizing steering setup.
Source paper
extracted_from(2025) · Hua, Tim Tian · Qin, Andrew · Marks, Samuel · Nanda, Neel
Neighborhood — ranked by edge-count
Findings (1)
finding
- Demonstrates distributed steering is more effective and less accuracy-damaging than concentrated steering.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates averaging multiple prompt pairs reduces noise; optimal subset selection further improves performance.
- Key asymmetry finding: suppressing reflection is easier than inducing it.
- Performance is best when skipping both the first and last six layers when applying interventionclaim0.776Empirical configuration finding from ablation study on layer selection
- Supported by the instruction discovery experiments comparing steering vs. embedding baselines.
- Comparative claim between the two steering strategies
- Empirical comparison showing advantage of SAE features in low-data regime.
- Shows that activation steering does not fully replicate mechanisms triggered by explicit prompting.
- Argues against the single-layer analysis approach of prior work.