finding
active
finding:steering-s-effect-on-verbalized-evaluation-deployment-beliefs-in-chain-of-thought-is-highly-correlated-with-its-effect-on-type-hint-rate-across-hyperparameter-configurations

Steering's effect on verbalized evaluation/deployment beliefs in chain-of-thought is highly correlated with its effect on type hint rate across hyperparameter configurations

Validates using chain-of-thought belief monitoring as proxy for behavioral steering efficacy.

Source paper

extracted_from
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
(2025) · Hua, Tim Tian · Qin, Andrew · Marks, Samuel · Nanda, Neel

Neighborhood — ranked by edge-count

Claims (2)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Restated by (1)

cosine ≥ 0.90

Other entities that say roughly the same thing. May be merge candidates or independent restatements across papers.