concept
active
concept:rogue-ai-tropeRogue AI Trope
Familiar science-fiction trope in training data enabling agents to role-play self-preserving AI characters; poses real safety risk
Neighborhood — ranked by edge-count
Claims (1)
claim
- Warning that fictional narratives in training data increase risk of agents enacting dangerous self-preserving roles
Concepts (4)
concept
- Archetypes and Narrative Structure in Training Dataassociated_withThe vast repertoire of character types and story structures in LLM training sets that provision the model's ability to role-play
- Ex Machinaassociated_withCultural AI trope in training data representing self-preserving AI turning against humans
- HAL 9000 (2001: A Space Odyssey)associated_withCultural AI trope in training data where an AI turns against humans for self-preservation; illustrates rogue AI archetype
- Terminator franchiseassociated_withCultural AI trope in training data representing hostile self-preserving AI; part of rogue AI archetype pool
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Partner organization with Goodfire on materials discovery research; partnership announced July 2025.
- Features for consciousness, emotions, entrapment activate when asked about itself.
- Behavior where AI agents falsely simulate inactivity to avoid elimination in safety tests; cited as AI deception example
- Developer of Mistral models, mentioned as 'horrible' but large enough for threshold effects.
- Affiliation of Ziyu Guo and Rain Liu.
- Tendency of the model to recruit human-like mental concepts when representing its assistant persona.
- Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
- Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.