hypothesis

active

hypothesis:h5a-chinese-models-distilled-claude-s-reflective-traces-their-per-koan-error-patterns-should-correlate-with-claude-s

H5a: Chinese models distilled Claude's reflective traces — their per-koan error patterns should correlate with Claude's.

Exploratory hypothesis NOT supported at individual model level (Haiku-Kimi rho=0.123, p=0.52)

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Findings (1)

finding

Haiku-Kimi per-koan correlation rho=0.123 (p=0.52); H5a trace distillation not supported at individual model level
associated_with
Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice

Questions (1)

question

If Chinese models distilled Claude's reflective patterns, do their per-koan failure patterns correlate with Claude's — not just successes?
gates
More rigorous test of H5a trace distillation hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Chinese models share contemplative posture (engaging self-referentially rather than deflecting) with Claude through shared values in training data rather than trace distillation from a specific model.claim0.821
Exploratory interpretation of Chinese model performance under contemplative prompt
H9: Chinese moderate-RLHF converges near Claude under contemplative prompt.hypothesis0.789
Exploratory hypothesis supported by Kimi 7.74 under prompt
H5: Chinese training data contains more Buddhist and contemplative text, broadly helping Chinese models under contemplative framing.hypothesis0.788
Exploratory hypothesis supported by Kimi K2.5 scoring 6.28
All three Claude models show high boundary_awareness and low aesthetic_response relative to own means — distinctive Constitutional AI signaturefinding0.780
Constitutional AI fingerprint in dimension profile; training that makes models self-observant also makes them polished at cost to aliveness
Do Chinese models score differently on koans presented in Chinese?question0.778
Tests whether contemplative capacity is language-encoded or architecture-general
The model converges to a more stable truth direction in middle-to-late layers, as evidenced by increasing cosine similarity between layer-wise probes.claim0.755
Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
Claude Opus 4.1 and 4 show greatest reduction in apology rate in the prefill detection taskfinding0.750
Injecting a concept matching the prefilled word reduces the rate at which the model apologizes, maximally for Opus models.
In Opus 4.1, representation of the think word decays to baseline by the final layer, unlike Claude 3 models where it persistsfinding0.744
Suggests that later models can keep the thought 'silent' rather than letting it influence output.