Qwen3-235B-A22B

Large open-source model used as anchor agent and anchor evolver; illustrates benchmark-dependent evolver performance

Neighborhood — ranked by edge-count

concept

Qwen3-32B
related_to
Weak-tier open-source model exhibiting both harness activation failure and adherence failure, with 25.1% skill-load rate

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen3-14Bconcept0.838
14B Qwen3 model quantized to 4-bit NF4; tested in OCEAN benchmarks
Qwen3-1.7Bconcept0.831
Smallest Qwen3 model tested; used in conscientiousness sweep example (Table 6)
Qwen3.5-9Bconcept0.826
Smallest model tested as evolver; produces harness updates comparable to Claude Opus 4.6 on SkillsBench
Qwen3-4Bconcept0.817
4B Qwen3 model tested in OCEAN benchmarks
Qwen2.5-VL-7Bconcept0.812
Base vision-language model used to instantiate ATLAS.
Qwen 3 0.6B Embeddingmethod0.754
Embedding model used to embed user messages for ridge regression analysis of persona drift causes
Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.728
Parameters don't predict scores; 135x more parameters yields 60% lower score
Qwen3-235B has SLR of 0.961 (nearly identical to Opus 4.6) yet HFR of only 0.350, with LPR of 0.022 vs. Opus 4.6's 0.177finding0.691
Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence