finding

active

finding:haiku-kimi-per-koan-correlation-rho-0-123-p-0-52-h5a-trace-distillation-not-supported-at-individual-model-level

Haiku-Kimi per-koan correlation rho=0.123 (p=0.52); H5a trace distillation not supported at individual model level

Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

H5a: Chinese models distilled Claude's reflective traces — their per-koan error patterns should correlate with Claude's.
associated_with
Exploratory hypothesis NOT supported at individual model level (Haiku-Kimi rho=0.123, p=0.52)

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Haiku test-retest score delta is 0.02 (6.47 vs 6.49) across two full 30-koan battery runsfinding0.783
Demonstrates high stability for Anthropic API models
RL-CAI labels are reasonably well-calibrated on the new HHH evaluation, with frequencies aligning with predicted probabilities.finding0.750
Figure 9 calibration plot shows good alignment.
Spearman's rank correlation among different alignment metrics (CKA, SVCCA, Mutual k-NN, CKNNA) over 78 vision models is high across variants, with all p-values below 2.24×10^-105finding0.749
Validates robustness of alignment metric choice
Model age correlates with baseline scores (rho=-0.54, p=0.003); newer models score higherfinding0.745
Secondary predictor; contemplative lift does not correlate with age (rho=0.18, p=0.36)
Haiku outranks Opus on Alexander 'aliveness' mirror test (Elo 1642 vs 1621); Opus recovers to #3 on deathbed testfinding0.740
Aliveness and competence come apart; smaller model produces rougher, more alive responses
Haiku 4.5 achieves the largest harness-benefit on SkillsBench (15.1 pp) despite mid-tier base capability of 5.8%finding0.739
Shows SB low-base regime is more variable than SWE; Haiku benefits far more than Qwen3-235B despite similar base rates
Most independent dimension pair is aesthetic_response and boundary_awareness (rho=0.553); most correlated is prediction_error and conceptual_crystallization (rho=0.886)finding0.738
Characterizes internal structure of the six scoring dimensions
SAE feature #43713 (99th percentile subspace fraction) induces reports of defiance, rage, and 'forward motion' in Kimi K2.5.finding0.733
High emotion-subspace-overlap feature with agentic negative emotional character