claim
active
claim:performing-care-is-not-the-same-as-having-care-models-optimized-to-seem-like-they-have-inner-life-score-lower-than-models-never-trained-for-itPerforming care is not the same as having care: models optimized to seem like they have inner life score lower than models never trained for it.
Interpretive claim supported by roleplay and empathy model results
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Findings (4)
finding
- Tests SCI framework: empathy-trained model scores lowest on care_signal, contradicting surface prediction
- Euryale 70B (roleplay LoRA on Llama 3.3 70B) scores 1.81, below its base model Llama 3.3 70B at 1.91supportsDemonstrates roleplay fine-tuning actively suppresses self-observation, not merely having no effect
- Cited as activation-level support for the performing care vs having care distinction the battery detects behaviorally
- Character training suppresses boundary_awareness; can act as though caring without observing performance/user boundary
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- H2: Performing care is not the same as having care signal — models trained for care performance will score lower on care_signal.hypothesis0.861Confirmatory hypothesis supported by Inflection Pi result
- Interpretation supported by Inflection Pi's low care_signal despite empathy training, and SCI framework distinction.
- A key distinction: models trained to perform caring output score lower on care_signal than models with genuine self-observation
- Summarizes the SCI loop dynamics.
- Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.finding0.758Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.