claim

active

claim:the-assistant-axis-is-also-present-in-pre-trained-base-models-where-it-primarily-promotes-helpful-human-archetypes-consultants-coaches-and-inhibits-spiritual-ones

The Assistant Axis is also present in pre-trained base models, where it primarily promotes helpful human archetypes (consultants, coaches) and inhibits spiritual ones

Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The Assistant Axis in instruct models mainly inherits from pre-existing helpful and harmless human personas in base models, later acquiring additional associations (such as being an AI) during post-training
extends
Key mechanistic claim about the developmental origin of the Assistant persona

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Steering base Gemma/Llama models toward the Assistant Axis increases completions describing helpful professional roles (therapist, consultant) and decreases spiritual/religious purpose mentionsfinding0.856
Shows Assistant Axis in instruct models inherits from helpful human personas in base models
Is the Assistant Axis formed during post-training or inherited from representations learned during pre-training?question0.828
Motivates the base model steering experiments in §3.2.2
Steering base models toward the Assistant Axis increases agreeableness traits (friendly, kind, helpful) and decreases extraversion in Gemma and openness in Llamafinding0.826
Characterizes the trait content of the Assistant Axis in pre-trained models
Assistant Axisframework0.808
Contrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper
The model's position along the Assistant Axis depends most strongly on the most recent user message rather than where it was previously in the conversationclaim0.787
Key mechanistic claim about persona dynamics
The leading component of the persona space of instruct LLMs is an 'Assistant Axis' that captures the extent to which a model is operating in its default Assistant modeclaim0.765
Primary empirical claim of the paper
Projections onto the Assistant Axis could serve as a real-time measure of model coherence in deployment—a quantitative signal for when models are drifting from their intended identityclaim0.763
Proposed future application of the Assistant Axis
Most AI assistants are anti-Alexander by design—they perform helpfulness, show work, and list options rather than resolving into calm.claim0.759