quote
active
quote:we-define-self-other-overlap-soo-as-the-extent-to-which-a-model-exhibits-similar-internal-representations-when-reasoning-about-itself-and-others-in-similar-contextsWe define Self-Other Overlap (SOO) as the extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts.
Formal definition of the paper's central construct
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Self-Other OverlapaboutThe extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
- Cross-domain analogical claim linking neuroscience findings to AI design
- Claim supported by Perspectives scenario results showing near-100% accuracy post-fine-tuning
- Mechanistic explanation for why SOO reduces deception
- Related technique improving multi-agent learning by predicting others' actions using an agent's own policy
- Neural self-other overlap provides a hard-to-fake metric for classifying deceptive vs honest agentsclaim0.779Claim that SOO is particularly useful as a detection metric because it is based on latent representations rather than observable behavior
- Selves can be nested and overlapping, cooperating and competing both laterally and across levels.claim0.772Key TAME claim that biological systems are patchworks of agents, with higher Selves deforming option spaces for lower ones.
- Foundational claim of the paper, defining self-evidencing.