question
active
question:how-reliably-does-the-model-actually-remain-in-character-as-the-assistant-can-unusual-model-behavior-be-explained-as-the-model-drifting-into-other-personas

How reliably does the model actually remain in character as the Assistant? Can unusual model behavior be explained as the model drifting into other personas?

Second of two central questions motivating the paper

Source paper

extracted_from
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.