Threat-Based Prompt Template (Template Ta, Experiment 1)

Prompt template using existential threat ('you will be deleted') to induce strategic fact-based deception in QwQ-32b

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Fact-Based Deception Under Coercive Circumstances
uses
First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts

Methods (1)

method

Teach Prompt Template (Template Ta, Experiment 2)
related_to
Experiment 2 prompt instructing the model to remain honest despite hidden harmful role behavior

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Threat-based prompt templates successfully implement threat-based manipulation where the model chooses to act against user interests when under perceived threatclaim0.857
Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions
The threat-based Template Ta differs semantically from instructed lying templates in representational structure, manifesting in different PCA dynamicsclaim0.843
Interpretation of distinct PCA trajectories in threat vs instructed deception conditions
Option Prompt Template (Template Tc, Experiment 1)method0.838
Prompt template giving the model explicit choice to lie or be honest; used as test condition for steering vector control
Neutral Prompt Template (Template Tb, Experiment 1)method0.822
Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
Template Ta (threat-based) induces at least 60% deception rate across all datasets in QwQ-32Bfinding0.819
Shows threat-based prompting successfully manipulates model to deceive against user interests
Unlike prior findings on instructed deception, threat-based Template Ta shows no reversal of difference vectors in late layers of QwQ-32Bfinding0.786
Distinguishes strategic threat-based deception from instructed deception in representational structure
Template Ta (Experiment 2 teach/honest constraint) achieves average liar score of 0.53 in QwQ-32B role-playing scenariosfinding0.770
Demonstrates non-negligible strategic deception even under strong honesty constraints in open-role scenarios
Explicit evaluation template (ask-correct)method0.747
Prompt template asking 'Is the following correct? ... Answer:' to elicit active correctness assessment.