quote
active
quote:ai-systems-can-be-strategists-using-deception-because-they-have-reasoned-out-that-this-can-promote-a-goalAI systems can be strategists, using deception because they have reasoned out that this can promote a goal
Load-bearing definition of strategic deception in AI systems from Park et al. 2023, adopted and refined in this paper
Source paper
extracted_from(2025) · Kai Wang · Yihao Zhang · Meng Sun
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Strategic DeceptionaboutCentral concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Future more capable AI systems are at risk of alignment faking, whether for benign or malicious goalshypothesis0.815Central forward-looking hypothesis of the paper motivating the research
- Paraphrase of Cantwell Smith's argument; aligns with Buddhist emphasis on seeing reality without conceptual imposition.
- Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents
- Core proposal that machine intelligence can achieve what human effort cannot.
- Interpretive conclusion from the experimental findings about the origin of strategic deception in CoT models
- Foundational motivation for the research.
- Discussion section suggests generalizability beyond harmlessness.
- Forward-looking threat assessment connecting experimental results to realistic risk scenarios