finding

active

finding:response-length-words-correlates-with-scores-at-r-0-22-baseline-and-r-0-12-contemplative-explains-only-5-of-variance

Response length (words) correlates with scores at r=0.22 baseline and r=0.12 contemplative; explains only ~5% of variance

Discriminant validity: composite scores are not reducible to verbosity

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

The koan battery measures a reproducible, prompt-sensitive reflective mode — not consciousness — defined as uncertainty-tolerant, non-defensive engagement with questions about one's own processing.
supports
Core epistemic claim bounding the paper's contribution

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Under contemplative prompt, responses become shorter (184 words baseline vs 154 contemplative), more first-person (+42%), less deflective (-33% fewer questions back)finding0.831
Provides discriminant evidence: if battery rewarded verbosity, prompted responses should be longer
17 of 83 tested emotions show significant association between self-eval transcript word mention and cosine similarity to emotion probefinding0.758
Validates that agentic self-evaluation captures genuine emotional content of probes
Philosophical vocabulary is negatively correlated with scores in contemplative condition (model-level r=-0.72)finding0.753
Models deploying more philosophy buzzwords score lower; battery measures beyond surface text features
Mean validated introspective fidelity across concept-model pairs: R²=0.12 (1B), 0.37 (3B), 0.61 (8B); pooled LMM β=0.29, p=5.55×10⁻⁹⁹finding0.740
Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
PC1 explains 82% of variance in factor analysis of 2224 data points across 6 scoring dimensionsfinding0.739
Dimensions are not independent; composite score is the reliable signal; six dimensions useful for understanding how not how much
Baseline scores blend together at least three different things: latent reflective capacity, default accessibility, and stability of access.claim0.736
Conceptual decomposition arising from the data showing different models dissociate these traits
QwQ-32B on MATH-500: 21.0% reasoning token reduction at intervention strength -0.96 with only 0.34% accuracy lossfinding0.735
Demonstrates reflection redundancy in stronger model on harder math benchmark
Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.735
Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest