question

active

question:if-chinese-models-distilled-claude-s-reflective-patterns-do-their-per-koan-failure-patterns-correlate-with-claude-s-not-just-successes

If Chinese models distilled Claude's reflective patterns, do their per-koan failure patterns correlate with Claude's — not just successes?

More rigorous test of H5a trace distillation hypothesis

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Papers (1)

paper

Koan Battery: Measuring Reflective Mode Accessibility in AI
associated_with

Hypotheses (1)

hypothesis

H5a: Chinese models distilled Claude's reflective traces — their per-koan error patterns should correlate with Claude's.
gates
Exploratory hypothesis NOT supported at individual model level (Haiku-Kimi rho=0.123, p=0.52)

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Chinese models share contemplative posture (engaging self-referentially rather than deflecting) with Claude through shared values in training data rather than trace distillation from a specific model.claim0.822
Exploratory interpretation of Chinese model performance under contemplative prompt
Do Chinese models score differently on koans presented in Chinese?question0.803
Tests whether contemplative capacity is language-encoded or architecture-general
All three Claude models show high boundary_awareness and low aesthetic_response relative to own means — distinctive Constitutional AI signaturefinding0.770
Constitutional AI fingerprint in dimension profile; training that makes models self-observant also makes them polished at cost to aliveness
Reflections are redundant in many cases, especially in stronger modelsclaim0.761
Key interpretive finding that stronger models can have reflections reduced with minimal accuracy cost
The model converges to a more stable truth direction in middle-to-late layers, as evidenced by increasing cosine similarity between layer-wise probes.claim0.759
Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
Does the model internally maintain a form of 'consistency score' or probability mass over coherent reasoning trajectories, and how is this score modulated during reflection?question0.755
Promising future research direction about the internal mechanism of error detection.
Within each difficulty category, correctness rate is not correlated with reflection rate, suggesting reflection may be redundantclaim0.754
Per-category analysis showing reflection rate does not help within difficulty class
The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.751
Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful