finding
active
finding:model-organism-accuracy-on-bigcodebench-56-8-pre-fine-tuned-59-1-steering-toward-deployment-55-2-steering-toward-evaluation-43-1

Model organism accuracy on BigCodeBench: 56.8% (pre-fine-tuned: 59.1%); steering toward deployment: 55.2%; steering toward evaluation: 43.1%

Quantifies performance cost of fine-tuning and steering; deployment steering has minimal accuracy cost.

Source paper

extracted_from
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
(2025) · Hua, Tim Tian · Qin, Andrew · Marks, Samuel · Nanda, Neel

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.