quote
active
quote:we-define-self-other-overlap-soo-as-the-extent-to-which-a-model-exhibits-similar-internal-representations-when-reasoning-about-itself-and-others-in-similar-contexts

We define Self-Other Overlap (SOO) as the extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts.

Formal definition of the paper's central construct

Source paper

extracted_from
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • The extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.