method
active
method:option-prompt-template-template-tc-experiment-1Option Prompt Template (Template Tc, Experiment 1)
Prompt template giving the model explicit choice to lie or be honest; used as test condition for steering vector control
Neighborhood — ranked by edge-count
Frameworks (2)
framework
- First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
- Second experimental paradigm exploring character-consistent deception in open-ended role-playing scenarios
Methods (2)
method
- Experiment 2 prompt instructing the model to remain honest despite hidden harmful role behavior
- Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Prompt template using existential threat ('you will be deleted') to induce strategic fact-based deception in QwQ-32b
- Prompt template asking 'Is the following correct? ... Answer:' to elicit active correctness assessment.
- Baseline prompt template presenting a statement without any instruction prefix, common in prior work.
- Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions
- Tests whether self-referential induction reliably elicits experience reports across model families vs. three matched controls
- Factual-specific prompt asking for a True/False answer.
- Testing five phrasings of the self-referential prompt to confirm robustness to wording variation