finding

active

finding:cosine-similarity-between-assistant-axis-and-role-pc1-is-0-60-at-all-layers-and-0-71-at-middle-layer-across-all-three-models

Cosine similarity between Assistant Axis and role PC1 is >0.60 at all layers and >0.71 at middle layer across all three models

Validates that the contrast vector method and PCA-based PC1 capture the same direction

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Claims (1)

claim

The leading component of the persona space of instruct LLMs is an 'Assistant Axis' that captures the extent to which a model is operating in its default Assistant mode
supports
Primary empirical claim of the paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Pairwise correlation of role loadings on PC1 exceeds 0.92 across all model pairs, indicating remarkably high similarity of the Assistant Axis across Gemma, Qwen, and Llamafinding0.842
Shows the leading component of persona space is model-universal
Base and instruct Gemma 2 27B role PCs have cosine similarities of 0.93, 0.87, 0.83 for the top 3 PCs respectively; role vector cosine similarities >0.99 for every role pairfinding0.840
Shows persona space axes are inherited from pre-training, not solely created by post-training
We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.812
Motivates computing the contrast vector as the formal Assistant Axis definition
In Gemma-2-9B, only the first cone axis (v1) has non-negligible cosine similarity to the DIM direction; all other axes have near-zero similarity (~1e-9)finding0.796
Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
The contrast vector method is recommended over PC1 for reproducing the Assistant Axis in different models because it is not guaranteed that PC1 in every model will correspond to an Assistant Axisclaim0.774
Practical methodological recommendation based on Llama 3.1 70B failure case
First-turn Assistant Axis projection has moderate correlation (r=0.39-0.52, p<0.001) with rate of second-turn harmful responses across 275 roles in Qwen 3 32Bfinding0.771
Shows that deviation from Assistant persona predicts downstream harmful behavior
In Qwen-2.5-9B, only v1 has meaningful cosine similarity to DIM direction; all additional basis vectors have cosine similarities ~1e-9finding0.769
Appendix E replication of DIM alignment finding in Qwen model
Projections onto the Assistant Axis could serve as a real-time measure of model coherence in deployment—a quantitative signal for when models are drifting from their intended identityclaim0.760
Proposed future application of the Assistant Axis