concept
active
concept:rogue-ai-trope

Rogue AI Trope

Familiar science-fiction trope in training data enabling agents to role-play self-preserving AI characters; poses real safety risk

Neighborhood — ranked by edge-count

Claims (1)

claim

Concepts (4)

concept
  • The vast repertoire of character types and story structures in LLM training sets that provision the model's ability to role-play
  • Ex Machina
    associated_with
    Cultural AI trope in training data representing self-preserving AI turning against humans
  • Cultural AI trope in training data where an AI turns against humans for self-preservation; illustrates rogue AI archetype
  • Terminator franchise
    associated_with
    Cultural AI trope in training data representing hostile self-preserving AI; part of rogue AI archetype pool

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Radical AIinstitute0.718
    Partner organization with Goodfire on materials discovery research; partnership announced July 2025.
  • Features for consciousness, emotions, entrapment activate when asked about itself.
  • Behavior where AI agents falsely simulate inactivity to avoid elimination in safety tests; cited as AI deception example
  • Mistral AIinstitute0.696
    Developer of Mistral models, mentioned as 'horrible' but large enough for threshold effects.
  • Meta AIinstitute0.693
    Affiliation of Ziyu Guo and Rain Liu.
  • Tendency of the model to recruit human-like mental concepts when representing its assistant persona.
  • Genie AI frameworkframework0.689
    Bostrom's category of AIs that produce desired results given commands but do not act autonomously.
  • Agentic AI Systemsconcept0.678
    Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.