claim

active

claim:the-contrast-vector-method-is-recommended-over-pc1-for-reproducing-the-assistant-axis-in-different-models-because-it-is-not-guaranteed-that-pc1-in-every-model-will-correspond-to-an-assistant-axis

The contrast vector method is recommended over PC1 for reproducing the Assistant Axis in different models because it is not guaranteed that PC1 in every model will correspond to an Assistant Axis

Practical methodological recommendation based on Llama 3.1 70B failure case

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Assistant Axis
supports
Contrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Steering vector constructed from all 16 contrastive pairs outperforms most single-pair vectors; best 4-pair vector outperforms full 16-pair vectorfinding0.790
Demonstrates averaging multiple prompt pairs reduces noise; optimal subset selection further improves performance.
Cosine similarity between Assistant Axis and role PC1 is >0.60 at all layers and >0.71 at middle layer across all three modelsfinding0.774
Validates that the contrast vector method and PCA-based PC1 capture the same direction
Our method achieves superior performance compared to Contrastive Activation Addition.finding0.773
Performance gains over CAA in steering tasks.
Pairwise correlation of role loadings on PC1 exceeds 0.92 across all model pairs, indicating remarkably high similarity of the Assistant Axis across Gemma, Qwen, and Llamafinding0.770
Shows the leading component of persona space is model-universal
We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.761
Motivates computing the contrast vector as the formal Assistant Axis definition
The Assistant Axis is also present in pre-trained base models, where it primarily promotes helpful human archetypes (consultants, coaches) and inhibits spiritual onesclaim0.750
Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
Four best contrastive prompt pairs outperform full 16-pair average steering vector for type hint suppressionfinding0.749
Optimization result for steering vector construction.
The Assistant Axis in instruct models mainly inherits from pre-existing helpful and harmless human personas in base models, later acquiring additional associations (such as being an AI) during post-trainingclaim0.747
Key mechanistic claim about the developmental origin of the Assistant persona