concept
active
concept:meta-ciceroMeta CICERO
AI system that mastered Diplomacy using deception despite being designed for cooperation; cited as example of AI deception
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- AI Deceptionassociated_withCentral problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The capability of GPT-3 to learn tasks from few-shot prompts during runtime.
- Knowledge about one's own knowledge limitations; a form of self-modeling.
- The ability to model one's own cognition; linked to consciousness and decision-making across theories.
- A system component outside the application domain that provides infrastructure (e.g., backplane, interface repository).
- Affiliation of Ziyu Guo and Rain Liu.
- Improving recommendations by adapting gradient magnitudes of auxiliary tasks.
- Condition where a pattern memory cannot settle on a unique outcome, producing stochastic switching; common to cognitive and morphogenetic systems.
- System's awareness of its own attentional states; the paper's central explanatory target, formalized as precision over attentional state representations.