Do Chinese models score differently on koans presented in Chinese?

Tests whether contemplative capacity is language-encoded or architecture-general

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Papers (1)

paper

Koan Battery: Measuring Reflective Mode Accessibility in AI
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

If Chinese models distilled Claude's reflective patterns, do their per-koan failure patterns correlate with Claude's — not just successes?question0.803
More rigorous test of H5a trace distillation hypothesis
do high koan scores indicate anything like experience, or sophisticated simulation of self-observation?question0.786
The hard problem the battery explicitly sidesteps but cannot answer
H5a: Chinese models distilled Claude's reflective traces — their per-koan error patterns should correlate with Claude's.hypothesis0.778
Exploratory hypothesis NOT supported at individual model level (Haiku-Kimi rho=0.123, p=0.52)
Alignment type is the only significant predictor of koan scores (p=0.006); architecture, parameter count, open/closed weights, MoE/dense are all non-significantfinding0.775
Main statistical finding: what predicts scores is training approach, not size or architecture
The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.762
Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
H4: Architecture doesn't matter, training does — architecture shows no significant association with koan scores.hypothesis0.744
Confirmatory hypothesis supported at p=0.440 (NS)
Does alignment type predict meta-cognitive style when models review consciousness research, as well as koan responses?question0.743
Four frontier models reviewing the paper each responded in the mode their alignment type predicts; N=1, awaiting systematic study
Models produce first-attempt mean scores 87.8-91.8/100 without steering across all model familiesfinding0.737
Establishes high baseline quality confirming steering-induced degradation is the experimental signal