method
active
method:few-shot-linear-probe-steering-baselineFew-shot linear probe steering baseline
Constructing steering vectors from the difference of mean activations on positive and negative examples, for comparison.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Unexpected finding that behavioral baseline underperforms representational probing approaches
- Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
- Empirical comparison showing advantage of SAE features in low-data regime.
- Control condition with steering disabled to confirm self-correction is induced by steering, not spontaneous
- The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
- Test-time adaptation from a small number of examples without parameter updates.
- Providing k labeled examples in the prompt to steer model behavior.
- Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly