finding

active

finding:qwen-35b-3b-active-params-score-4-38-outscores-hermes-405b-405b-active-params-score-1-75-by-2-5x

Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5x

Parameters don't predict scores; 135x more parameters yields 60% lower score

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

What predicts self-observation-like scores is training approach (alignment type), not model size or architecture.
supports
Central interpretive claim from statistical analysis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen3-32B achieves a skill-load rate of 0.251, while Opus 4.6, Sonnet 4.6, and Qwen3-235B achieve SLR of 0.957–0.961finding0.805
Quantifies harness activation failure for weak-tier models vs. strong-tier models
Qwen3-235B has SLR of 0.961 (nearly identical to Opus 4.6) yet HFR of only 0.350, with LPR of 0.022 vs. Opus 4.6's 0.177finding0.795
Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence
Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5finding0.787
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
On SWE-bench, harness-benefit peaks at Qwen3-235B (19.3 pp), while weaker Qwen3-32B gains only 4.4 pp and stronger Opus 4.6 gains only 2.6 ppfinding0.768
Core finding demonstrating non-monotonic relationship between base capability and harness-benefit
Qwen 2.5 7B wellbeing probe: peak Cohen's d=3.5finding0.765
Strongest cross-family probe; explains clearer introspection in Qwen than Gemma
Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.764
Quantifies harness adherence failure gap between strong and weak tier models
Qwen3.5-9Bconcept0.763
Smallest model tested as evolver; produces harness updates comparable to Claude Opus 4.6 on SkillsBench
Qwen3-235B achieves only 1.1 pp harness-benefit on SkillsBench despite 4.7% base pass rate, near Qwen3-32B's 0.0% baselinefinding0.757
Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit