Anti-AI-Lab Behavior Evaluation

Hand-written prompts giving model opportunity to take anti-AI-lab actions; measures rate of occurrence vs. baselines

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Anti-AI-Lab Behaviorconcept0.942
Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers
MIT AI Labinstitute0.767
Co-founded by Marvin Minsky; gatekeeping institution for AI research; holder of power over research agenda.
MIT AI Laboratoryinstitute0.762
Generative Model Fitting to AI Behaviormethod0.742
Proposed future method: fit active inference generative models to AI behavior to verify wise world model internalization
Agentic AI Systemsconcept0.740
Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.
Artificial Intelligenceconcept0.739
Domain where consciousness theories are being applied to synthetic systems; part of broader context of unconventional embodiments.
AI alignmentconcept0.738
Field within which this work has implications for evaluating alignment progress.
AI Safetyconcept0.737
The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.