method
active
method:no-steering-baseline-experimentNo-Steering Baseline Experiment
Control condition with steering disabled to confirm self-correction is induced by steering, not spontaneous
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Control using objectively-NO factual questions under identical injection to measure global logit shift vs. genuine detection signal
- Constructing steering vectors from the difference of mean activations on positive and negative examples, for comparison.
- Paradigm of finding the right direction in activation space (e.g., linear steering).
- 0% multi-attempt responses across 7,892 no-steering baseline trials confirming ESR is steering-inducedfinding0.748Control result establishing that self-correction is specifically induced by steering, not spontaneous model behavior
- Steering Vector Control maintains low unexpected rate of 0.08 in Experiment 1, comparable to baselinefinding0.731Shows that inducing deception via steering vectors preserves semantic coherence and does not cause random errors
- Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
- Shows gating effect is specific to the self-referential computational regime, not a general feature effect
- The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs