claim
active
claim:neural-self-other-overlap-in-humans-mediates-empathy-and-inversely-predicts-deceptive-behavior-motivating-the-soo-approach-for-aiNeural self-other overlap in humans mediates empathy and inversely predicts deceptive behavior, motivating the SOO approach for AI
Cross-domain analogical claim linking neuroscience findings to AI design
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Neural self-other overlap provides a hard-to-fake metric for classifying deceptive vs honest agentsclaim0.865Claim that SOO is particularly useful as a detection metric because it is based on latent representations rather than observable behavior
- Neuroscientific phenomenon where self and other representations partially converge, linked to empathy and altruism
- Neuroscience finding linking extraordinary altruism to increased anterior insula SOO
- Formal definition of the paper's central construct
- Deceptive RL baseline agents have lower mean neural self-other overlap than honest baseline agentsclaim0.798Core empirical prediction tested in RL experiments, confirmed by 100% classification accuracy
- Mechanistic explanation for why SOO reduces deception
- Claim supported by Perspectives scenario results showing near-100% accuracy post-fine-tuning
- Critical verbatim statement highlighting the universal inference basis of sentience.