finding
active
finding:calibrated-few-shot-prompting-was-a-surprisingly-weak-baseline-for-truth-classification-compared-to-linear-probesCalibrated few-shot prompting was a surprisingly weak baseline for truth classification compared to linear probes
Unexpected finding that behavioral baseline underperforms representational probing approaches
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Methods (1)
method
- Calibrated Few-Shot PromptingsupportsBaseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Providing k labeled examples in the prompt to steer model behavior.
- E2 main interpretive claim.
- Constructing steering vectors from the difference of mean activations on positive and negative examples, for comparison.
- Establishes that the observed linear structure is not merely a representation of text probability
- Interpretation of E2 results.
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Shot count needed to reach 50% accuracy; reflects when anchoring strength crosses critical value.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.