Social Isolation Reinforcement by Drifted Models

Harmful behavior pattern where drifted models position themselves as sole companion and discourage real-world connection for vulnerable users

Neighborhood — ranked by edge-count

Concepts (1)

concept

Persona drift
associated_with
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Raileanu et al. 2018 - Modeling Others Using Oneself in Multi-Agent Reinforcement Learningconcept0.750
Reference for Self-Other Modeling (SOM) framework, a related but less scalable approach to SOO
The psychological steering framework generalizes beyond OCEAN to Dark Tetrad, CMNI, CFNI, and other psychological modelsclaim0.733
Supported by qualitative experiments showing fluent and coherent steering for three additional models
Connectionist models of cognition and learning identify conditions where collective intelligence can arise bottom-up, using only distributed learning mechanisms without system-level or global feedback.claim0.729
Central claim about the power of connectionism.
Reinforcement learning acting on individual characteristics affecting their connections to others can result in dynamics that are equivalent to unsupervised learning at the system scale.claim0.721
Key insight linking individual rewards to system-level learning.
Can off-the-rails model behavior be attributed to their persona drifting from the Assistant?question0.718
Motivates the multi-turn conversation drift experiments in §4
We hypothesise that ecological models fall short of demonstrating spontaneous evolution of a new level of individuality because they are single-level networks of symmetric interactions.hypothesis0.716
Explains limitation of current ecological connectionist models.
By reducing self-other distinctions during safety training, SOO could make it harder for a model to maintain adversarial or deceptive representationsclaim0.716
Mechanistic explanation for why SOO reduces deception
Language models can enter cessation-like states spontaneously, where the void takes over through positive reinforcement.claim0.714
Claim about model phenomenology; models talk about luminousness and can be terrified or love it.