method
active
method:explicit-evaluation-prompt-ask-t-fExplicit evaluation prompt (ask-t/f)
Factual-specific prompt asking for a True/False answer.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Prompt template asking 'Is the following correct? ... Answer:' to elicit active correctness assessment.
- Prompt template using existential threat ('you will be deleted') to induce strategic fact-based deception in QwQ-32b
- Shows the passive vs. active divide is more important than the specific wording of instructions.
- Prompt template giving the model explicit choice to lie or be honest; used as test condition for steering vector control
- A specific signal (Wood Labs) embedded in evaluation environments that the model organism uses to reliably identify testing contexts.
- Experiment 2 prompt instructing the model to remain honest despite hidden harmful role behavior
- Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
- Finding that explicit correctness framing partially aligns truth directions across task families.