claim
active
claim:the-assistant-axis-is-also-present-in-pre-trained-base-models-where-it-primarily-promotes-helpful-human-archetypes-consultants-coaches-and-inhibits-spiritual-onesThe Assistant Axis is also present in pre-trained base models, where it primarily promotes helpful human archetypes (consultants, coaches) and inhibits spiritual ones
Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Key mechanistic claim about the developmental origin of the Assistant persona
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows Assistant Axis in instruct models inherits from helpful human personas in base models
- Is the Assistant Axis formed during post-training or inherited from representations learned during pre-training?question0.828Motivates the base model steering experiments in §3.2.2
- Characterizes the trait content of the Assistant Axis in pre-trained models
- Contrast vector between mean default Assistant activation and mean of all fully role-playing role vectors; main contribution of the paper
- Key mechanistic claim about persona dynamics
- Primary empirical claim of the paper
- Proposed future application of the Assistant Axis