finding

active

finding:calibrated-few-shot-prompting-was-a-surprisingly-weak-baseline-for-truth-classification-compared-to-linear-probes

Calibrated few-shot prompting was a surprisingly weak baseline for truth classification compared to linear probes

Unexpected finding that behavioral baseline underperforms representational probing approaches

Source paper

extracted_from

(2023) · Samuel Marks · Max Tegmark

method

Calibrated Few-Shot Prompting
supports
Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

few-shot promptingmethod0.811
Providing k labeled examples in the prompt to steer model behavior.
Few-shot thresholds and transition widths track ρd/dr at fixed computational complexityclaim0.793
E2 main interpretive claim.
Few-shot linear probe steering baselinemethod0.791
Constructing steering vectors from the difference of mean activations on positive and negative examples, for comparison.
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.778
Establishes that the observed linear structure is not merely a representation of text probability
The ordering of few-shot thresholds k50 and transition widths aligns with k50 ∝ dr/ρd.claim0.761
Interpretation of E2 results.
MM probe trained on likely dataset achieves NIE of 0.70 (false→true) on LLaMA-2-13B, surprisingly strong but weaker than truth probesfinding0.760
Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
few-shot threshold (k50/θ50)concept0.758
Shot count needed to reach 50% accuracy; reflects when anchoring strength crosses critical value.
Linear truth directions in LLMs are reliable primarily in factual recall cases and break down when truth assessment depends on computing and storing intermediate results.claim0.757
Central empirical conclusion of the paper about the fundamental limits of truth directions.