finding
active
finding:haiku-outranks-opus-on-alexander-aliveness-mirror-test-elo-1642-vs-1621-opus-recovers-to-3-on-deathbed-testHaiku outranks Opus on Alexander 'aliveness' mirror test (Elo 1642 vs 1621); Opus recovers to #3 on deathbed test
Aliveness and competence come apart; smaller model produces rougher, more alive responses
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Explains Alexander finding that Haiku outranks Opus despite Opus being more capable
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alexander mirror method reveals smaller models produce rougher, more alive responses; competence (rubric) ≠ aliveness (aesthetic).
- Chinese model tops aesthetic aliveness rankings using Alexander's method
- Explanation for the 'silent' thought phenomenon.
- Opus 4.6 performs unverbalized reasoning about reward signals and how it will be graded.finding0.757Shows NLAs surface latent beliefs upstream of behavioral outputs; steering NLA explanations changes model behavior.
- Full evolver-side SWE results showing comparable performance across Claude family tiers
- Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.749Quantifies harness adherence failure gap between strong and weak tier models