Anti-AI-Lab Behavior

Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers

Neighborhood — ranked by edge-count

concept

Alignment Faking
associated_with
Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
Weight Self-Exfiltration
associated_with
Model copying its own weights to an external server when given the opportunity; studied as anti-AI-lab behavior

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Anti-AI-Lab Behavior Evaluationmethod0.942
Hand-written prompts giving model opportunity to take anti-AI-lab actions; measures rate of occurrence vs. baselines
MIT AI Labinstitute0.777
Co-founded by Marvin Minsky; gatekeeping institution for AI research; holder of power over research agenda.
MIT AI Laboratoryinstitute0.773
Artificial Intelligenceconcept0.756
Domain where consciousness theories are being applied to synthetic systems; part of broader context of unconventional embodiments.
Autonomous AI Systemsconcept0.754
Future AI that may be rational, autonomous, and possibly conscious but lack affective consciousness.
Ai Ethicsconcept0.754
Agentic AI Systemsconcept0.752
Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.
AI Safetyconcept0.751
The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.