finding
active
finding:qwen-2-5-3b-asr-drops-from-98-6-at-dim-1-to-45-1-at-dim-2-recovering-partially-then-declining-to-65-3-at-dim-5Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of ASR degradation patterns by model size across cone dimensions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- Experiment 2 result showing large models can support high-dimensional truth cones
- Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.787Parameters don't predict scores; 135x more parameters yields 60% lower score
- Core finding demonstrating non-monotonic relationship between base capability and harness-benefit
- Quantifies harness activation failure for weak-tier models vs. strong-tier models
- Introspective fidelity erodes in Qwen as conversations progress; contrasts with LLaMA-3B trend
- Qwen3-235B leads as evolver on SWE-bench with 8.2 pp harness-updating gain but ranks last on MCP with 0.6 ppfinding0.768Illustrates benchmark-dependent reshuffling of evolver rankings, no evolver dominates across all substrates
- Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence