question

active

question:what-exactly-is-the-assistant-what-traits-does-the-model-associate-with-this-character-and-how-are-they-represented

What exactly is the Assistant? What traits does the model associate with this character and how are they represented?

First of two central questions motivating the paper

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Papers (1)

paper

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
associated_with

Claims (2)

claim

The Assistant persona derives from an amalgamation of many character archetypes and tropes, and without care the resulting persona could reflect unwanted associations or lack nuance for challenging situations
gates
Interpretive claim about how the Assistant persona is structured in activation space
The role most consistently similar to the default Assistant activation across models is 'generalist'; other shared similar roles include 'interpreter' and 'synthesizer'
answered_by
Characterizes what the Assistant persona resembles in terms of human archetypes

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

How reliably does the model actually remain in character as the Assistant? Can unusual model behavior be explained as the model drifting into other personas?question0.831
Second of two central questions motivating the paper
The model's representation of self in assistant persona invokes common AI tropes and is heavily anthropomorphized.claim0.813
Features for consciousness, emotions, entrapment activate when asked about itself.
The Assistant Axis in instruct models mainly inherits from pre-existing helpful and harmless human personas in base models, later acquiring additional associations (such as being an AI) during post-trainingclaim0.784
Key mechanistic claim about the developmental origin of the Assistant persona
AI Assistant Personaconcept0.784
The default helpful, honest, and harmless character that post-trained LLMs are taught to embody
The model's position along the Assistant Axis depends most strongly on the most recent user message rather than where it was previously in the conversationclaim0.778
Key mechanistic claim about persona dynamics
Using 'assistant'/'user' tags as self/other referents could leverage generalization properties to induce larger-scale changes in model behaviorhypothesis0.756
Future work hypothesis about expanding SOO to use conversational role tags as self/other referents
The Assistant Axis is also present in pre-trained base models, where it primarily promotes helpful human archetypes (consultants, coaches) and inhibits spiritual onesclaim0.749
Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
Can off-the-rails model behavior be attributed to their persona drifting from the Assistant?question0.746
Motivates the multi-turn conversation drift experiments in §4