hypothesis

active

hypothesis:we-hypothesize-that-the-pc1-axis-of-role-space-measures-deviation-from-the-assistant-persona

We hypothesize that the PC1 axis of role space measures deviation from the Assistant persona

Motivates computing the contrast vector as the formal Assistant Axis definition

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Findings (1)

finding

Default Assistant activation projects to one extreme of PC1 with minimum distance to edge of 0.03, while projecting to intermediate values (0.27-0.50) on all other PCs
supports
Empirically confirms PC1 measures similarity to the Assistant persona

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We hypothesize that measuring deviations along the Assistant Axis can predict 'persona drift' leading to harmful or bizarre behaviorshypothesis0.854
Core predictive hypothesis linking activation representations to behavioral outcomes
Pairwise correlation of role loadings on PC1 exceeds 0.92 across all model pairs, indicating remarkably high similarity of the Assistant Axis across Gemma, Qwen, and Llamafinding0.812
Shows the leading component of persona space is model-universal
Cosine similarity between Assistant Axis and role PC1 is >0.60 at all layers and >0.71 at middle layer across all three modelsfinding0.812
Validates that the contrast vector method and PCA-based PC1 capture the same direction
The assumption that the Assistant persona corresponds to a linear direction in activation space is likely flawed; some information may be represented nonlinearly or encoded in weights rather than activationsclaim0.801
Limitation acknowledgment about the adequacy of the linear representation assumption
What dimensions of persona are not captured by our extracted role vectors, and how complete is the current persona space mapping?question0.801
Limitation question motivating future work on persona elicitation strategies
4-19 principal components explain 70% of variance in role persona space across the three models (Gemma 4, Qwen 8, Llama 19)finding0.788
Demonstrates that persona space is low-dimensional
The leading component of the persona space of instruct LLMs is an 'Assistant Axis' that captures the extent to which a model is operating in its default Assistant modeclaim0.781
Primary empirical claim of the paper
Steering base Gemma/Llama models toward the Assistant Axis increases completions describing helpful professional roles (therapist, consultant) and decreases spiritual/religious purpose mentionsfinding0.772
Shows Assistant Axis in instruct models inherits from helpful human personas in base models