Few-shot linear probe steering baseline

Constructing steering vectors from the difference of mean activations on positive and negative examples, for comparison.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Calibrated few-shot prompting was a surprisingly weak baseline for truth classification compared to linear probesfinding0.791
Unexpected finding that behavioral baseline underperforms representational probing approaches
linear steeringmethod0.781
Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
Feature steering was effective in 5 out of 7 cases where few-shot probe steering vectors failed to produce meaningful behavior change.finding0.780
Empirical comparison showing advantage of SAE features in low-data regime.
No-Steering Baseline Experimentmethod0.779
Control condition with steering disabled to confirm self-correction is induced by steering, not spontaneous
Linear steering is often mismatched with a model's internal representation geometry, producing noisy, off-target effects.claim0.766
The paper's critique of the standard linear steering baseline, supported by the days-of-week demo.
Few-shot learningconcept0.761
Test-time adaptation from a small number of examples without parameter updates.
few-shot promptingmethod0.760
Providing k labeled examples in the prompt to steer model behavior.
Calibrated Few-Shot Promptingmethod0.744
Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly