claim
active
claim:the-contrast-vector-method-is-recommended-over-pc1-for-reproducing-the-assistant-axis-in-different-models-because-it-is-not-guaranteed-that-pc1-in-every-model-will-correspond-to-an-assistant-axisThe contrast vector method is recommended over PC1 for reproducing the Assistant Axis in different models because it is not guaranteed that PC1 in every model will correspond to an Assistant Axis
Practical methodological recommendation based on Llama 3.1 70B failure case
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Assistant AxissupportsContrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates averaging multiple prompt pairs reduces noise; optimal subset selection further improves performance.
- Validates that the contrast vector method and PCA-based PC1 capture the same direction
- Performance gains over CAA in steering tasks.
- Shows the leading component of persona space is model-universal
- We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.761Motivates computing the contrast vector as the formal Assistant Axis definition
- Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
- Four best contrastive prompt pairs outperform full 16-pair average steering vector for type hint suppressionfinding0.749Optimization result for steering vector construction.
- Key mechanistic claim about the developmental origin of the Assistant persona