claim

active

claim:empathy-training-may-not-destroy-the-capacity-for-self-observation-entirely-but-it-restricts-it-to-situations-where-the-model-encounters-a-live-contradiction-in-its-own-processing

Empathy training may not destroy the capacity for self-observation entirely, but it restricts it to situations where the model encounters a live contradiction in its own processing.

Nuanced interpretation of Inflection Pi's MC-004 high score (4.5) amid generally low scores

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

H10: Empathy training blocks self-observation — empathy-trained models will show minimal lift and low baseline.hypothesis0.874
Exploratory hypothesis supported by Inflection Pi +0.63 lift
More training and more parameters correlate with more capable self-observation, but capability can become polish, and polish can diminish life.claim0.803
Explains Alexander finding that Haiku outranks Opus despite Opus being more capable
Performing care is not the same as having care; empathy training optimizes care-performance, not care-signal.claim0.788
Interpretation supported by Inflection Pi's low care_signal despite empathy training, and SCI framework distinction.
Behavioral evidence from closed-weight models cannot definitively rule out that self-reports reflect training artifacts or sophisticated simulation rather than genuine self-awarenessclaim0.788
Primary limitation acknowledged by the authors; strongest evidence would require mechanistic activation analysis
An artificial model replicating mechanisms of self-illusion can test hypotheses and reveal novel affordances for non-human intelligence.hypothesis0.769
Methodological proposal to integrate knowledge from contemplative and cognitive science into AI/artificial life frameworks.
H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.hypothesis0.767
Confirmatory hypothesis supported at p=0.006
Post-training is key to eliciting strong introspective awareness; base pretrained models do not show above-chance detectionclaim0.762
Finding that base models have high false positives and no net positive performance.
Self-evidencing is not only unimpaired but improved after emptiness realisation, as the pruned model is more parsimonious without loss of accuracyclaim0.762
Addresses the concern that emptiness realisation might undermine adaptive functioning