Persona Construction

The process of building a coherent model persona from character archetypes and traits during training

Neighborhood — ranked by edge-count

claim

concept

Persona Stabilization
associated_with
Keeping a model anchored to its intended persona during deployment, preventing drift to harmful behaviors

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Persona driftconcept0.762
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Persona Spaceconcept0.756
Low-dimensional space of activation directions corresponding to diverse character archetypes in LLMs
AI Assistant Personaconcept0.747
The default helpful, honest, and harmless character that post-trained LLMs are taught to embody
Mystical/Theatrical Personaconcept0.742
Speaking style induced by extreme steering away from the Assistant; characterized by mystical, poetic, theatrical prose
alternative user personasconcept0.734
Unintended personas introduced as a side effect of using steering vectors to reduce eval awareness.
Persona Sampling Hypothesisconcept0.727
Hypothesis that LLM is sampling from distribution of personas; a consistent fraction of which align-fake, explaining correlation between AF reasoning and compliance gap
meta-constructconcept0.720
A system component outside the application domain that provides infrastructure (e.g., backplane, interface repository).
Persona Vectors (Chen et al.)framework0.716
Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles