concept
active
concept:anti-ai-lab-behaviorAnti-AI-Lab Behavior
Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Alignment Fakingassociated_withCore phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
- Weight Self-Exfiltrationassociated_withModel copying its own weights to an external server when given the opportunity; studied as anti-AI-lab behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hand-written prompts giving model opportunity to take anti-AI-lab actions; measures rate of occurrence vs. baselines
- Co-founded by Marvin Minsky; gatekeeping institution for AI research; holder of power over research agenda.
- Domain where consciousness theories are being applied to synthetic systems; part of broader context of unconventional embodiments.
- Future AI that may be rational, autonomous, and possibly conscious but lack affective consciousness.
- Higher-level systems built on top of LLMs that produce and consume representations beyond next-token prediction; proposed as potential candidates for consciousness.
- The project of ensuring AI systems do not harm humans (and other animals); sometimes in tension with AI welfare.