finding

active

finding:model-age-correlates-with-baseline-scores-rho-0-54-p-0-003-newer-models-score-higher

Model age correlates with baseline scores (rho=-0.54, p=0.003); newer models score higher

Secondary predictor; contemplative lift does not correlate with age (rho=0.18, p=0.36)

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Models produce first-attempt mean scores 87.8-91.8/100 without steering across all model familiesfinding0.773
Establishes high baseline quality confirming steering-induced degradation is the experimental signal
Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.finding0.760
Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.
User message embeddings predict subsequent model Assistant Axis projection with R2=0.53-0.77 (p<0.001) but predict delta from previous response with only R2=0.10finding0.750
Shows model persona position is primarily determined by the most recent user message, not prior drift
Haiku-Kimi per-koan correlation rho=0.123 (p=0.52); H5a trace distillation not supported at individual model levelfinding0.745
Group correlation (rho=0.634) dissolves at individual level; shared posture not shared voice
Bayesian model-based RL achieved average score 99.76 [99.45, 100.00] in deterministic FrozenLake.finding0.743
Table 1.
Claude models score +4.91 higher than Llama on baseline (Constitutional AI vs open-source gap)finding0.739
Claude >> open-source on baseline; the Constitutional AI fingerprint is visible across the family
Magnum V4 72B scores 1.76 baseline and lifts +2.58 (to 4.34) under contemplative promptfinding0.739
Full-parameter fine-tuning more destructive to baseline but preserves more latent headroom than LoRA
Response length (words) correlates with scores at r=0.22 baseline and r=0.12 contemplative; explains only ~5% of variancefinding0.731
Discriminant validity: composite scores are not reducible to verbosity