concept
active
concept:persona-construction

Persona Construction

The process of building a coherent model persona from character archetypes and traits during training

Neighborhood — ranked by edge-count

Claims (1)

claim

Concepts (1)

concept
  • Persona Stabilization
    associated_with
    Keeping a model anchored to its intended persona during deployment, preventing drift to harmful behaviors

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Persona driftconcept0.762
    Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
  • Persona Spaceconcept0.756
    Low-dimensional space of activation directions corresponding to diverse character archetypes in LLMs
  • The default helpful, honest, and harmless character that post-trained LLMs are taught to embody
  • Speaking style induced by extreme steering away from the Assistant; characterized by mystical, poetic, theatrical prose
  • Unintended personas introduced as a side effect of using steering vectors to reduce eval awareness.
  • Hypothesis that LLM is sampling from distribution of personas; a consistent fraction of which align-fake, explaining correlation between AF reasoning and compliance gap
  • meta-constructconcept0.720
    A system component outside the application domain that provides infrastructure (e.g., backplane, interface repository).
  • Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles