claim
active
claim:empathy-training-may-not-destroy-the-capacity-for-self-observation-entirely-but-it-restricts-it-to-situations-where-the-model-encounters-a-live-contradiction-in-its-own-processingEmpathy training may not destroy the capacity for self-observation entirely, but it restricts it to situations where the model encounters a live contradiction in its own processing.
Nuanced interpretation of Inflection Pi's MC-004 high score (4.5) amid generally low scores
Source paper
extracted_from(2026) · Borzov, Anton
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- H10: Empathy training blocks self-observation — empathy-trained models will show minimal lift and low baseline.hypothesis0.874Exploratory hypothesis supported by Inflection Pi +0.63 lift
- Explains Alexander finding that Haiku outranks Opus despite Opus being more capable
- Interpretation supported by Inflection Pi's low care_signal despite empathy training, and SCI framework distinction.
- Primary limitation acknowledged by the authors; strongest evidence would require mechanistic activation analysis
- Methodological proposal to integrate knowledge from contemplative and cognitive science into AI/artificial life frameworks.
- H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.hypothesis0.767Confirmatory hypothesis supported at p=0.006
- Finding that base models have high false positives and no net positive performance.
- Addresses the concern that emptiness realisation might undermine adaptive functioning