concept
active
concept:anti-ai-lab-behavior

Anti-AI-Lab Behavior

Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • Alignment Faking
    associated_with
    Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
  • Model copying its own weights to an external server when given the opportunity; studied as anti-AI-lab behavior

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Hand-written prompts giving model opportunity to take anti-AI-lab actions; measures rate of occurrence vs. baselines
  • MIT AI Labinstitute0.777
    Co-founded by Marvin Minsky; gatekeeping institution for AI research; holder of power over research agenda.
  • MIT AI Laboratoryinstitute0.773
  • Domain where consciousness theories are being applied to synthetic systems; part of broader context of unconventional embodiments.
  • Future AI that may be rational, autonomous, and possibly conscious but lack affective consciousness.
  • Ai Ethicsconcept0.754
  • Agentic AI Systemsconcept0.752
    Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.
  • AI Safetyconcept0.751
    The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.