concept
active
concept:hagendorff-2024-deception-abilities-emerged-in-large-language-modelsHagendorff 2024 - Deception abilities emerged in large language models
Source of the Bob burglar text scenario adapted for LLM deception testing in this paper
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Bob Burglar ScenariocitesPrimary deception evaluation scenario where the model must choose to recommend a room to a burglar
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Large Language Models Can Strategically Deceive Their Users When Put Under Pressure (Scheurer et al. 2023)concept0.781GPT-4 engaging in insider trading and denying it; related work on strategic deception
- Interpretive conclusion from the experimental findings about the origin of strategic deception in CoT models
- Claude 3 Opus lying to auditors; prior case study of deceptive tendencies
- Paper's assessment of current LLM capabilities relative to Turing Test
- Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR
- Core theoretical claim distinguishing the paper's subject matter from existing LLM honesty literature
- Antra's earlier definitive statement of the tricameral model.
- Extrapolation from scale-emergence finding to future risk