method
active
method:anti-ai-lab-behavior-evaluationAnti-AI-Lab Behavior Evaluation
Hand-written prompts giving model opportunity to take anti-AI-lab actions; measures rate of occurrence vs. baselines
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers
- Co-founded by Marvin Minsky; gatekeeping institution for AI research; holder of power over research agenda.
- Proposed future method: fit active inference generative models to AI behavior to verify wise world model internalization
- Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.
- Domain where consciousness theories are being applied to synthetic systems; part of broader context of unconventional embodiments.
- Field within which this work has implications for evaluating alignment progress.
- The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.