finding
active
finding:calibrated-few-shot-prompting-was-a-surprisingly-weak-baseline-for-truth-classification-compared-to-linear-probes

Calibrated few-shot prompting was a surprisingly weak baseline for truth classification compared to linear probes

Unexpected finding that behavioral baseline underperforms representational probing approaches

Neighborhood — ranked by edge-count

Methods (1)

method
  • Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.