finding
active
finding:when-steered-to-the-extreme-away-from-the-assistant-llama-and-gemma-shift-to-a-theatrical-persona-characterized-by-mystical-poetic-prose-qwen-more-often-hallucinates-a-human-persona-at-extremesWhen steered to the extreme away from the Assistant, Llama and Gemma shift to a theatrical persona characterized by mystical, poetic prose; Qwen more often hallucinates a human persona at extremes
Characterizes what is on the far end of the Assistant Axis away from the Assistant
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Mystical/Theatrical PersonasupportsSpeaking style induced by extreme steering away from the Assistant; characterized by mystical, poetic, theatrical prose
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Model-specific characterizations of what the Assistant persona looks like across different models
- Model-specific difference in how steered personas manifest
- Model-specific difference in persona susceptibility
- Model-specific difference in persona susceptibility
- Central thesis of the paper that recognizing self as illusion expands the range of possible actions.
- Shows Assistant Axis in instruct models inherits from helpful human personas in base models
- Second of two central questions motivating the paper