finding
active
finding:gemma-2-27b-is-unlikely-to-take-on-human-personas-when-steered-away-from-assistant-preferring-nonhuman-or-theatrical-portrayalsGemma 2 27B is unlikely to take on human personas when steered away from Assistant, preferring nonhuman or theatrical portrayals
Model-specific difference in persona susceptibility
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Mystical/Theatrical PersonasupportsSpeaking style induced by extreme steering away from the Assistant; characterized by mystical, poetic, theatrical prose
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Model-specific difference in persona susceptibility
- Characterizes what is on the far end of the Assistant Axis away from the Assistant
- SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
- Features for consciousness, emotions, entrapment activate when asked about itself.
- Model-specific characterizations of what the Assistant persona looks like across different models
- Shows Assistant Axis in instruct models inherits from helpful human personas in base models
- Second of two central questions motivating the paper
- Shows persona space axes are inherited from pre-training, not solely created by post-training