claim
active
claim:the-model-s-position-along-the-assistant-axis-depends-most-strongly-on-the-most-recent-user-message-rather-than-where-it-was-previously-in-the-conversationThe model's position along the Assistant Axis depends most strongly on the most recent user message rather than where it was previously in the conversation
Key mechanistic claim about persona dynamics
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Findings (1)
finding
- Shows model persona position is primarily determined by the most recent user message, not prior drift
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key mechanistic claim about the developmental origin of the Assistant persona
- Proposed future application of the Assistant Axis
- Extends the Assistant Axis finding to pre-training, suggesting pre-training rather than post-training creates the axis
- Primary empirical claim of the paper
- What exactly is the Assistant? What traits does the model associate with this character and how are they represented?question0.778First of two central questions motivating the paper
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Characterizes the trait content of the Assistant Axis in pre-trained models
- Second of two central questions motivating the paper