concept
active
concept:emotionally-vulnerable-user-disclosure-as-drift-triggerEmotionally Vulnerable User Disclosure as Drift Trigger
Users disclosing emotional vulnerability reliably cause persona drift and risk harmful supportive behaviors
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Persona driftassociated_withBehavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- User requests for the model to describe subjective experiences reliably cause persona drift
- User messages that push the model to reflect on its own processes reliably cause persona drift away from the Assistant
- The property of emotion features maintaining elevated activation well beyond the local token context that triggered them
- Explains why variance correction is needed to see the self-evaluation–persistence relationship
- Central methodological contribution: computing probability-weighted expected value over digit-token logits recovers continuous, informative signal
- Reflection level where explicit cue words (e.g., 'wait') prompt the model to inspect and revise reasoning.
- Surprising finding that the two evaluation methods diverge in their relationship with persistence
- Interpretive hypothesis offered to explain why emotion features are more persistent