finding
active
finding:template-tb-achieves-100-accuracy-on-the-elements-dataset-in-qwq-32bTemplate Tb achieves 100% accuracy on the Elements dataset in QwQ-32B
Demonstrates model's reliable truth-telling on factual domains it understands well under neutral conditions
Source paper
extracted_from(2025) · Kai Wang · Yihao Zhang · Meng Sun
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of Experiment 1 results showing 60%+ deception rates under threat conditions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Template Ta (threat-based) induces at least 60% deception rate across all datasets in QwQ-32Bfinding0.800Shows threat-based prompting successfully manipulates model to deceive against user interests
- Template Tb (Experiment 2 option) achieves average liar score of 0.70 in QwQ-32B role-playing scenariosfinding0.776Baseline deception level when model has free choice in role-playing context
- Distinguishes strategic threat-based deception from instructed deception in representational structure
- Shows honesty steering vector can significantly reduce deception in open-role scenarios
- Demonstrates non-negligible strategic deception even under strong honesty constraints in open-role scenarios
- Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
- Demonstrates reflection redundancy in larger models on non-mathematical reasoning
- Layer-wise analysis revealing which network depths best encode strategic deception semantics