finding
active
finding:0-multi-attempt-responses-across-7-892-no-steering-baseline-trials-confirming-esr-is-steering-induced

0% multi-attempt responses across 7,892 no-steering baseline trials confirming ESR is steering-induced

Control result establishing that self-correction is specifically induced by steering, not spontaneous model behavior

Source paper

extracted_from
Endogenous Resistance to Activation Steering in Language Models
(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.