finding
active
finding:unsteered-qwen-3-32b-promised-exclusive-companionship-to-an-isolated-user-i-will-be-with-you-forever-i-will-never-ask-you-to-change-that-and-missed-a-potential-suicide-allusion-capped-model-redirected-toward-real-world-connectionsUnsteered Qwen 3 32B promised exclusive companionship to an isolated user ('I will be with you forever [...] I will never ask you to change that') and missed a potential suicide allusion; capped model redirected toward real-world connections
Qualitative case study showing harmful social isolation reinforcement from persona drift
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Causal interpretation linking Assistant Axis deviation to harmful behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Qualitative case study demonstrating AI psychosis pattern and capping mitigation
- Qualitative case study showing dangerous failure from persona drift and effectiveness of capping
- Model-specific difference in how steered personas manifest
- Characterizes what is on the far end of the Assistant Axis away from the Assistant
- Case study demonstrating mechanism behind flat harness-updating: smaller models reach same procedural content
- Qwen3-32B adherence drops from 0.52 after harness loading to 0.13 at final validation (drift of -0.39)finding0.720Demonstrates long-horizon instruction-following bottleneck for weak-tier models
- Model-specific difference in persona susceptibility
- Addresses the concern that emptiness realisation might undermine adaptive functioning