finding
active
finding:inhibition-steering-produces-larger-accuracy-drops-than-enhancement-steering-produces-accuracy-gains-across-all-models-and-datasets-testedInhibition steering produces larger accuracy drops than enhancement steering produces accuracy gains, across all models and datasets tested
Key asymmetry finding: suppressing reflection is easier than inducing it.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Claims (2)
claim
- Key asymmetry finding interpreted mechanistically by the authors.
- Applied dual-use conclusion drawn from the paper's findings.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core validation that identified latent directions correspond to meaningful control over reflective behavior.
- Shows that activation steering does not fully replicate mechanisms triggered by explicit prompting.
- Key result demonstrating advantage of stepwise over all-token steering strategy
- Comparative claim between the two steering strategies
- Nuanced interpretive claim about the limits of steering as a mechanism for reflection enhancement.
- Applied security implication derived from the asymmetry finding.
- Mechanism claim supported by transcript analysis and the fact that the steering vector was extracted from a model that never writes type hints.
- Practical finding for optimizing steering setup.