claim
active
claim:steering-affects-type-hint-writing-by-modifying-the-model-s-belief-about-whether-it-is-being-evaluated-not-by-directly-encoding-type-hint-information

Steering affects type hint writing by modifying the model's belief about whether it is being evaluated, not by directly encoding type hint information

Mechanism claim supported by transcript analysis and the fact that the steering vector was extracted from a model that never writes type hints.

Source paper

extracted_from
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
(2025) · Hua, Tim Tian · Qin, Andrew · Marks, Samuel · Nanda, Neel

Neighborhood — ranked by edge-count

Findings (3)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.