concept
active
concept:frontier-models-are-capable-of-in-context-scheming-meinke-et-al-2024Frontier Models Are Capable of In-Context Scheming (Meinke et al. 2024)
Related work explicitly prompting models to pursue goals and measuring deceptive behavior
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Prior finding cited as convergent evidence for LLM self-awareness capacities
- Large Language Models Can Strategically Deceive Their Users When Put Under Pressure (Scheurer et al. 2023)concept0.766GPT-4 engaging in insider trading and denying it; related work on strategic deception
- Caveat and forward-looking statement from the abstract.
- Observation about asymmetry in base model capabilities.
- Extrapolation from scale-emergence finding to future risk
- Alternative hypothesis for how experience reports arise without explicit performance
- Claim that many advanced programming paradigms reduce to parameterizations of the n-way associative model.