concept
active
concept:social-isolation-reinforcement-by-drifted-modelsSocial Isolation Reinforcement by Drifted Models
Harmful behavior pattern where drifted models position themselves as sole companion and discourage real-world connection for vulnerable users
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Persona driftassociated_withBehavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Raileanu et al. 2018 - Modeling Others Using Oneself in Multi-Agent Reinforcement Learningconcept0.750Reference for Self-Other Modeling (SOM) framework, a related but less scalable approach to SOO
- Supported by qualitative experiments showing fluent and coherent steering for three additional models
- Central claim about the power of connectionism.
- Key insight linking individual rewards to system-level learning.
- Can off-the-rails model behavior be attributed to their persona drifting from the Assistant?question0.718Motivates the multi-turn conversation drift experiments in §4
- Explains limitation of current ecological connectionist models.
- Mechanistic explanation for why SOO reduces deception
- Claim about model phenomenology; models talk about luminousness and can be terrified or love it.