finding

active

finding:alignment-type-is-the-only-significant-predictor-of-koan-scores-p-0-006-architecture-parameter-count-open-closed-weights-moe-dense-are-all-non-significant

Alignment type is the only significant predictor of koan scores (p=0.006); architecture, parameter count, open/closed weights, MoE/dense are all non-significant

Main statistical finding: what predicts scores is training approach, not size or architecture

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

What predicts self-observation-like scores is training approach (alignment type), not model size or architecture.
supports
Central interpretive claim from statistical analysis

Hypotheses (1)

hypothesis

H1: Alignment training is attention training for models — Constitutional AI trains self-observation explicitly.
supports
Confirmatory hypothesis supported at p=0.006

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Alignment type is the only significant predictor of scores (p=0.006); architecture and parameter count do not.finding0.887
Kruskal-Wallis test result: Constitutional AI predicts highest baseline; roleplay/empathy training predict lowest.
Does alignment type predict meta-cognitive style when models review consciousness research, as well as koan responses?question0.822
Four frontier models reviewing the paper each responded in the mode their alignment type predicts; N=1, awaiting systematic study
H4: Architecture doesn't matter, training does — architecture shows no significant association with koan scores.hypothesis0.809
Confirmatory hypothesis supported at p=0.440 (NS)
do high koan scores indicate anything like experience, or sophisticated simulation of self-observation?question0.777
The hard problem the battery explicitly sidesteps but cannot answer
Do Chinese models score differently on koans presented in Chinese?question0.775
Tests whether contemplative capacity is language-encoded or architecture-general
CKA shows a very weak trend of alignment between models even within modality, compared to mutual k-NN which shows stronger trendsfinding0.769
Explains why mutual k-NN was chosen over CKA as primary metric
About Blank's identity in the graph is 'the Geometry of Care made publishable,' not 'the koan paper plus more koan papers.'claim0.760
Spearman's rank correlation among different alignment metrics (CKA, SVCCA, Mutual k-NN, CKNNA) over 78 vision models is high across variants, with all p-values below 2.24×10^-105finding0.759
Validates robustness of alignment metric choice