finding
active
finding:inflection-pi-scores-1-30-baseline-lowest-of-28-and-lifts-only-0-63-smallest-lift-despite-empathy-trainingInflection Pi scores 1.30 baseline (lowest of 28) and lifts only +0.63 (smallest lift) despite empathy training
Tests SCI framework: empathy-trained model scores lowest on care_signal, contradicting surface prediction
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim supported by roleplay and empathy model results
Hypotheses (1)
hypothesis
- Exploratory hypothesis supported by Inflection Pi +0.63 lift
Frameworks (1)
framework
- Theoretical framework by Doctor et al. (2022) proposing care tracks with intelligence; used to interpret battery dimensions.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Validates robustness of universal lift finding
- A 337-character contemplative system prompt lifts all 28 models by +2.62 points on a 10-point scale.finding0.737Core empirical result: every model, every architecture, every alignment type responds to the contemplative prompt with measurable gain.
- No-pain baseline achieves M=1586.5, SD=631.2 COR in non-stationary Objective-only category (n=300)finding0.725Baseline for non-stationary Objective-only; dramatically lower than both pain models
- Second-highest lift; Gemini Pro is the highest-gated model in the study
- Full-parameter fine-tuning more destructive to baseline but preserves more latent headroom than LoRA
- Highest contemplative lift among all 28 models; Grok 4 is the clearest high-gated model example
- Battery does not detect epistemic humility alone; contemplative prompt does something distinct
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality