Qwen3-4B

4B Qwen3 model tested in OCEAN benchmarks

Neighborhood — ranked by edge-count

concept

Qwen3-1.7B
related_to
Smallest Qwen3 model tested; used in conscientiousness sweep example (Table 6)
Qwen3.5-9B
related_to
Smallest model tested as evolver; produces harness updates comparable to Claude Opus 4.6 on SkillsBench
Qwen3-32B
related_to
Weak-tier open-source model exhibiting both harness activation failure and adherence failure, with 25.1% skill-load rate
Qwen3-14B
related_to
14B Qwen3 model quantized to 4-bit NF4; tested in OCEAN benchmarks

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen3-235B-A22Bconcept0.817
Large open-source model used as anchor agent and anchor evolver; illustrates benchmark-dependent evolver performance
Qwen 3 0.6B Embeddingmethod0.815
Embedding model used to embed user messages for ridge regression analysis of persona drift causes
Qwen2.5-VL-7Bconcept0.799
Base vision-language model used to instantiate ATLAS.
LLaMA3.1-8Bconcept0.723
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.694
Parameters don't predict scores; 135x more parameters yields 60% lower score
4-bit NF4 Quantizationconcept0.691
Quantization applied to LLMs above 12B parameters to enable evaluation on available hardware
Qwen 3 32B is most likely to hallucinate human personas (names, birthplaces, years of experience) when steered away from the Assistantfinding0.691
Model-specific difference in how steered personas manifest
Qwen3-32B achieves a skill-load rate of 0.251, while Opus 4.6, Sonnet 4.6, and Qwen3-235B achieve SLR of 0.957–0.961finding0.689
Quantifies harness activation failure for weak-tier models vs. strong-tier models