finding

active

finding:pairwise-correlation-of-role-loadings-on-pc1-exceeds-0-92-across-all-model-pairs-indicating-remarkably-high-similarity-of-the-assistant-axis-across-gemma-qwen-and-llama

Pairwise correlation of role loadings on PC1 exceeds 0.92 across all model pairs, indicating remarkably high similarity of the Assistant Axis across Gemma, Qwen, and Llama

Shows the leading component of persona space is model-universal

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The leading component of the persona space of instruct LLMs is an 'Assistant Axis' that captures the extent to which a model is operating in its default Assistant mode
supports
Primary empirical claim of the paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Pairwise correlation of role loadings on PC2 is 0.89 between Qwen and Llama; Gemma differs (similarity <0.61) from others on PC2finding0.892
Characterizes model similarities and differences in secondary persona dimensions
Cosine similarity between Assistant Axis and role PC1 is >0.60 at all layers and >0.71 at middle layer across all three modelsfinding0.842
Validates that the contrast vector method and PCA-based PC1 capture the same direction
Pairwise similarity of trait PC1 across all three models is >0.81; no pairwise correlation in top 3 trait PCs is below 0.70finding0.818
Shows trait space has more cross-model consistency than role space beyond PC1
Base and instruct Gemma 2 27B role PCs have cosine similarities of 0.93, 0.87, 0.83 for the top 3 PCs respectively; role vector cosine similarities >0.99 for every role pairfinding0.817
Shows persona space axes are inherited from pre-training, not solely created by post-training
We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.812
Motivates computing the contrast vector as the formal Assistant Axis definition
First-turn Assistant Axis projection has moderate correlation (r=0.39-0.52, p<0.001) with rate of second-turn harmful responses across 275 roles in Qwen 3 32Bfinding0.806
Shows that deviation from Assistant persona predicts downstream harmful behavior
Steering base models toward the Assistant Axis increases agreeableness traits (friendly, kind, helpful) and decreases extraversion in Gemma and openness in Llamafinding0.792
Characterizes the trait content of the Assistant Axis in pre-trained models
4-19 principal components explain 70% of variance in role persona space across the three models (Gemma 4, Qwen 8, Llama 19)finding0.787
Demonstrates that persona space is low-dimensional