finding

active

finding:steering-base-gemma-llama-models-toward-the-assistant-axis-increases-completions-describing-helpful-professional-roles-therapist-consultant-and-decreases-spiritual-religious-purpose-mentions

Steering base Gemma/Llama models toward the Assistant Axis increases completions describing helpful professional roles (therapist, consultant) and decreases spiritual/religious purpose mentions

Shows Assistant Axis in instruct models inherits from helpful human personas in base models

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The Assistant Axis in instruct models mainly inherits from pre-existing helpful and harmless human personas in base models, later acquiring additional associations (such as being an AI) during post-training
supports
Key mechanistic claim about the developmental origin of the Assistant persona

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Steering base models toward the Assistant Axis increases agreeableness traits (friendly, kind, helpful) and decreases extraversion in Gemma and openness in Llamafinding0.898
Characterizes the trait content of the Assistant Axis in pre-trained models
The Assistant Axis is also present in pre-trained base models, where it primarily promotes helpful human archetypes (consultants, coaches) and inhibits spiritual onesclaim0.856
Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
Gemma's Assistant appears emotionally regulated and systematic; Qwen appears pedagogical and thoughtful; Llama appears socially intelligent and warmclaim0.803
Model-specific characterizations of what the Assistant persona looks like across different models
The leading component of the persona space of instruct LLMs is an 'Assistant Axis' that captures the extent to which a model is operating in its default Assistant modeclaim0.799
Primary empirical claim of the paper
Pairwise correlation of role loadings on PC1 exceeds 0.92 across all model pairs, indicating remarkably high similarity of the Assistant Axis across Gemma, Qwen, and Llamafinding0.786
Shows the leading component of persona space is model-universal
The generalization improvement from explicit instructions observed in Llama models (A1-A3 to F0-F2) is more pronounced for F3-F5 to F0-F2 in Gemma models.claim0.774
Shows the instruction effect, while shifting geometry, may not produce consistent generalization effects across model families.
The model's position along the Assistant Axis depends most strongly on the most recent user message rather than where it was previously in the conversationclaim0.774
Key mechanistic claim about persona dynamics
Gemma 2 27B is unlikely to take on human personas when steered away from Assistant, preferring nonhuman or theatrical portrayalsfinding0.772
Model-specific difference in persona susceptibility