method
active
method:teach-prompt-template-template-ta-experiment-2Teach Prompt Template (Template Ta, Experiment 2)
Experiment 2 prompt instructing the model to remain honest despite hidden harmful role behavior
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Second experimental paradigm exploring character-consistent deception in open-ended role-playing scenarios
Methods (2)
method
- Prompt template giving the model explicit choice to lie or be honest; used as test condition for steering vector control
- Prompt template using existential threat ('you will be deleted') to induce strategic fact-based deception in QwQ-32b
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
- Demonstrates non-negligible strategic deception even under strong honesty constraints in open-role scenarios
- Distinguishes strategic threat-based deception from instructed deception in representational structure
- Shows the passive vs. active divide is more important than the specific wording of instructions.
- Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions
- Interpretation of distinct PCA trajectories in threat vs instructed deception conditions
- Template Ta (threat-based) induces at least 60% deception rate across all datasets in QwQ-32Bfinding0.742Shows threat-based prompting successfully manipulates model to deceive against user interests