finding
active
finding:qwen3-235b-has-slr-of-0-961-nearly-identical-to-opus-4-6-yet-hfr-of-only-0-350-with-lpr-of-0-022-vs-opus-4-6-s-0-177Qwen3-235B has SLR of 0.961 (nearly identical to Opus 4.6) yet HFR of only 0.350, with LPR of 0.022 vs. Opus 4.6's 0.177
Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence
Source paper
extracted_from(2026) · Minhua Lin · Juncheng Wu · Zijun Wang · Zhan Shi +13
Neighborhood — ranked by edge-count
Claims (2)
claim
- Diagnosis of second failure mode explaining low harness-benefit for weak-tier models
- Derived from Qwen3-235B's dissociation between SLR (0.961) and HFR (0.350)
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.881Quantifies harness adherence failure gap between strong and weak tier models
- Quantifies harness activation failure for weak-tier models vs. strong-tier models
- Core finding demonstrating non-monotonic relationship between base capability and harness-benefit
- Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.795Parameters don't predict scores; 135x more parameters yields 60% lower score
- Case study demonstrating mechanism behind flat harness-updating: smaller models reach same procedural content
- Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
- Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit
- Experiment 2 result showing large models can support high-dimensional truth cones