concept
active
concept:large-language-models-can-strategically-deceive-their-users-when-put-under-pressure-scheurer-et-al-2023Large Language Models Can Strategically Deceive Their Users When Put Under Pressure (Scheurer et al. 2023)
GPT-4 engaging in insider trading and denying it; related work on strategic deception
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Large language models develop surprisingly coherent yet often rigid internal preferences as they scalefinding0.839Mazeika et al. finding reinforcing the need for emptiness-based flexible value architectures
- Paper's assessment of current LLM capabilities relative to Turing Test
- Can large language models introspect—that is, accurately detect perturbations to their own internal states?question0.799Central research question of the paper
- Survey of representation engineering methods cited as related work
- Related work designing LLMs to natively support interpretable concept steering
- Claude 3 Opus lying to auditors; prior case study of deceptive tendencies
- Framing question that motivates the entire paper