finding
active
finding:haiku-kimi-per-koan-correlation-rho-0-123-p-0-52-h5a-trace-distillation-not-supported-at-individual-model-levelHaiku-Kimi per-koan correlation rho=0.123 (p=0.52); H5a trace distillation not supported at individual model level
Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- H5a: Chinese models distilled Claude's reflective traces — their per-koan error patterns should correlate with Claude's.associated_withExploratory hypothesis NOT supported at individual model level (Haiku-Kimi rho=0.123, p=0.52)
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Haiku test-retest score delta is 0.02 (6.47 vs 6.49) across two full 30-koan battery runsfinding0.783Demonstrates high stability for Anthropic API models
- Figure 9 calibration plot shows good alignment.
- Validates robustness of alignment metric choice
- Model age correlates with baseline scores (rho=-0.54, p=0.003); newer models score higherfinding0.745Secondary predictor; contemplative lift does not correlate with age (rho=0.18, p=0.36)
- Aliveness and competence come apart; smaller model produces rougher, more alive responses
- Shows SB low-base regime is more variable than SWE; Haiku benefits far more than Qwen3-235B despite similar base rates
- Characterizes internal structure of the six scoring dimensions
- High emotion-subspace-overlap feature with agentic negative emotional character