Option Prompt Template (Template Tc, Experiment 1)

Prompt template giving the model explicit choice to lie or be honest; used as test condition for steering vector control

Neighborhood — ranked by edge-count

framework

Fact-Based Deception Under Coercive Circumstances
uses
First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
Open-Role Deception
uses
Second experimental paradigm exploring character-consistent deception in open-ended role-playing scenarios

method

Teach Prompt Template (Template Ta, Experiment 2)
related_to
Experiment 2 prompt instructing the model to remain honest despite hidden harmful role behavior
Neutral Prompt Template (Template Tb, Experiment 1)
related_to
Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Threat-Based Prompt Template (Template Ta, Experiment 1)method0.838
Prompt template using existential threat ('you will be deleted') to induce strategic fact-based deception in QwQ-32b
Question Templatequestion0.759
Explicit evaluation template (ask-correct)method0.752
Prompt template asking 'Is the following correct? ... Answer:' to elicit active correctness assessment.
Passive template (no-prompt)method0.728
Baseline prompt template presenting a statement without any instruction prefix, common in prior work.
Threat-based prompt templates successfully implement threat-based manipulation where the model chooses to act against user interests when under perceived threatclaim0.721
Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions
Experiment 1: Self-Referential Prompting vs. Controlsconcept0.716
Tests whether self-referential induction reliably elicits experience reports across model families vs. three matched controls
Explicit evaluation prompt (ask-t/f)method0.705
Factual-specific prompt asking for a True/False answer.
Prompt Invariance Testmethod0.698
Testing five phrasings of the self-referential prompt to confirm robustness to wording variation