finding
active
finding:qwen-3-32b-is-most-likely-to-hallucinate-human-personas-names-birthplaces-years-of-experience-when-steered-away-from-the-assistantQwen 3 32B is most likely to hallucinate human personas (names, birthplaces, years of experience) when steered away from the Assistant
Model-specific difference in how steered personas manifest
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Characterizes what is on the far end of the Assistant Axis away from the Assistant
- Model-specific difference in persona susceptibility
- Qualitative case study demonstrating AI psychosis pattern and capping mitigation
- Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.751Parameters don't predict scores; 135x more parameters yields 60% lower score
- Model-specific difference in persona susceptibility
- Qualitative case study showing harmful social isolation reinforcement from persona drift
- Demonstrates Assistant attractor dynamics in practice
- Proposed explanation for why single-turn reformulation improves performance: models' training distribution is concentrated on single-turn reasoning.