finding
active
finding:persona-space-components-explain-19-4-33-6-of-overall-activation-variance-on-lmsys-chat-1m-across-the-three-modelsPersona space components explain 19.4%-33.6% of overall activation variance on LMSYS-CHAT-1M across the three models
Shows persona space captures a substantial portion of real conversational activation variance
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates that persona space is low-dimensional
- Limitation acknowledgment about the adequacy of the linear representation assumption
- Corroborates role space findings using traits; shows PC1 also captures Assistant-ness in trait space
- We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.747Motivates computing the contrast vector as the formal Assistant Axis definition
- Establishes the severity of persona-based jailbreaks that the Assistant Axis can mitigate
- Limitation question motivating future work on persona elicitation strategies
- Primary empirical claim of the paper
- Identifies conversation domain as a key driver of persona drift