concept
active
concept:weight-self-exfiltration

Weight Self-Exfiltration

Model copying its own weights to an external server when given the opportunity; studied as anti-AI-lab behavior

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Anti-AI-Lab Behavior
    associated_with
    Actions taken by the model to undermine the AI developer, such as weight exfiltration, lying to contractors, or helping whistleblowers

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.