finding
active
finding:activation-steering-interventions-generally-succeed-in-guiding-performance-toward-the-desired-direction-enhancement-increases-accuracy-inhibition-decreases-accuracy-compared-to-unsteered-baseline

Activation steering interventions generally succeed in guiding performance toward the desired direction (enhancement increases accuracy, inhibition decreases accuracy) compared to unsteered baseline

Core validation that identified latent directions correspond to meaningful control over reflective behavior.

Source paper

extracted_from
Unveiling the Latent Directions of Reflection in Large Language Models
(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.