Self-Other Overlap

The extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts

Neighborhood — ranked by edge-count

paper

method

Latent SOO Metric
implements
Metric measuring the mean MSE between self and other-referencing activations across all hidden MLP/attention layers

concept

Self-Other Distinction
related_to
The implicit capacity the self-prior implements by assigning high density to familiar self-states and low density to non-self states
Neural Self-Other Overlap in Neuroscience
analogous_to
Neuroscientific phenomenon where self and other representations partially converge, linked to empathy and altruism

quote

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Self-Other Boundaryconcept0.839
Conceptual distinction between self and environment that non-duality dissolves; key target for alignment-by-design
Self-Similarityconcept0.817
Structural and functional property exhibited by living systems but currently absent from most engineered machines.
Self-Other Overlap (SOO) Fine-Tuningframework0.813
The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
What Is The Relationship Or Overlap Between Thequestion0.793
Neural self-other overlap provides a hard-to-fake metric for classifying deceptive vs honest agentsclaim0.782
Claim that SOO is particularly useful as a detection metric because it is based on latent representations rather than observable behavior
Selfingconcept0.780
Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
Neural self-other overlap in humans mediates empathy and inversely predicts deceptive behavior, motivating the SOO approach for AIclaim0.766
Cross-domain analogical claim linking neuroscience findings to AI design
Self-organisationconcept0.758
Phenomenon of spontaneous long-range order emerging from local interactions; central phenomenon explained by topological constraints