claim
active
claim:two-components-are-important-to-shaping-model-character-persona-construction-and-persona-stabilizationTwo components are important to shaping model character: persona construction and persona stabilization
Overarching conceptual framework the paper introduces for model safety
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Keeping a model anchored to its intended persona during deployment, preventing drift to harmful behaviors
- Persona ConstructioncitesThe process of building a coherent model persona from character archetypes and traits during training
Claims (1)
claim
- Central interpretive claim and motivation for future work
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Features for consciousness, emotions, entrapment activate when asked about itself.
- Demonstrates that persona space is low-dimensional
- Second of two central questions motivating the paper
- How does different post-training data shift a model's position along persona dimensions?question0.753Future work direction: using persona space to study effects of training data on model character
- Causal interpretation linking Assistant Axis deviation to harmful behavior
- Empirical characterization of conversation domains that are safe for model persona stability
- Motivated by near-identical PCs for base and instruct Gemma