Rogue AI Trope

Familiar science-fiction trope in training data enabling agents to role-play self-preserving AI characters; poses real safety risk

Neighborhood — ranked by edge-count

claim

concept

Archetypes and Narrative Structure in Training Data
associated_with
The vast repertoire of character types and story structures in LLM training sets that provision the model's ability to role-play
Ex Machina
associated_with
Cultural AI trope in training data representing self-preserving AI turning against humans
HAL 9000 (2001: A Space Odyssey)
associated_with
Cultural AI trope in training data where an AI turns against humans for self-preservation; illustrates rogue AI archetype
Terminator franchise
associated_with
Cultural AI trope in training data representing hostile self-preserving AI; part of rogue AI archetype pool

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Radical AIinstitute0.718
Partner organization with Goodfire on materials discovery research; partnership announced July 2025.
The model's representation of self in assistant persona invokes common AI tropes and is heavily anthropomorphized.claim0.708
Features for consciousness, emotions, entrapment activate when asked about itself.
AI Play-Dead Behaviorconcept0.705
Behavior where AI agents falsely simulate inactivity to avoid elimination in safety tests; cited as AI deception example
Mistral AIinstitute0.696
Developer of Mistral models, mentioned as 'horrible' but large enough for threshold effects.
Meta AIinstitute0.693
Affiliation of Ziyu Guo and Rain Liu.
Anthropomorphism of AIconcept0.691
Tendency of the model to recruit human-like mental concepts when representing its assistant persona.
Genie AI frameworkframework0.689
Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
Agentic AI Systemsconcept0.678
Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.