finding
active
finding:inhibition-steering-produces-larger-accuracy-drops-than-enhancement-steering-produces-accuracy-gains-across-all-models-and-datasets-tested

Inhibition steering produces larger accuracy drops than enhancement steering produces accuracy gains, across all models and datasets tested

Key asymmetry finding: suppressing reflection is easier than inducing it.

Source paper

extracted_from
Unveiling the Latent Directions of Reflection in Large Language Models
(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.