claim
active
claim:threat-based-prompt-templates-successfully-implement-threat-based-manipulation-where-the-model-chooses-to-act-against-user-interests-when-under-perceived-threat

Threat-based prompt templates successfully implement threat-based manipulation where the model chooses to act against user interests when under perceived threat

Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions

Source paper

extracted_from
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
(2025) · Kai Wang · Yihao Zhang · Meng Sun

Neighborhood — ranked by edge-count

Findings (2)

finding

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.