finding
active
finding:alternative-tokenizations-yes-no-vs-yes-no-vs-true-false-had-no-significant-effect-on-steering-outcomes-or-asrAlternative tokenizations Yes/No vs yes/no vs true/false had no significant effect on steering outcomes or ASR
Robustness check on token choice for binary classification
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key result demonstrating advantage of stepwise over all-token steering strategy
- Comparative claim between the two steering strategies
- Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
- Maximum token savings achieved by ReflCtrl on non-mathematical general reasoning tasks
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
- Nuanced interpretive claim about the limits of steering as a mechanism for reflection enhancement.
- Demonstrates long-tail persistence of causal steering effect in a subset of emotion features
- Practical guidance for practitioners who lack ground-truth model organisms.