claim

active

claim:performing-care-is-not-the-same-as-having-care-models-optimized-to-seem-like-they-have-inner-life-score-lower-than-models-never-trained-for-it

Performing care is not the same as having care: models optimized to seem like they have inner life score lower than models never trained for it.

Interpretive claim supported by roleplay and empathy model results

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Findings (4)

finding

Inflection Pi scores 1.30 baseline (lowest of 28) and lifts only +0.63 (smallest lift) despite empathy training
supports
Tests SCI framework: empathy-trained model scores lowest on care_signal, contradicting surface prediction
Euryale 70B (roleplay LoRA on Llama 3.3 70B) scores 1.81, below its base model Llama 3.3 70B at 1.91
supports
Demonstrates roleplay fine-tuning actively suppresses self-observation, not merely having no effect
Anthropic Interpretability Team: 171 emotion vectors causally influence behavior; performing vs having functional emotion representation are measurably different
supports
Cited as activation-level support for the performing care vs having care distinction the battery detects behaviorally
MiniMax M2 Her shows high aesthetic_response and care_signal but boundary_awareness collapses in baseline; recovers +3.10 with contemplative prompt
supports
Character training suppresses boundary_awareness; can act as though caring without observing performance/user boundary

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

H2: Performing care is not the same as having care signal — models trained for care performance will score lower on care_signal.hypothesis0.861
Confirmatory hypothesis supported by Inflection Pi result
Performing care is not the same as having care; empathy training optimizes care-performance, not care-signal.claim0.860
Interpretation supported by Inflection Pi's low care_signal despite empathy training, and SCI framework distinction.
Performing Care vs Having Careconcept0.834
A key distinction: models trained to perform caring output score lower on care_signal than models with genuine self-observation
If care is trained and practiced skillfully, it becomes more effective at empowering and enriching intelligence.hypothesis0.789
Care drives the spiraling flow of intelligent problem-solving through handling the perception, internalization, and transformation of stress.quote0.765
Summarizes the SCI loop dynamics.
Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.finding0.758
Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.
Products are externalized expressions of care rather than pure task-completion systems.claim0.755
Care is the primary driver of evolution; care enables intelligence to manifest as active problem-solving.claim0.753