paper
referenced-only
2025
paper:arxiv-2507-21509Persona vectors: Monitoring and controlling character traits in language models
ByRunjin Chen·Andy Arditi·Henry Sleight·Owain Evans·Jack Lindsey
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Persona Vectors in Games: Measuring and Steering Strategies via Activation VectorsAndrew Zhang Johnathan Sun2026≈ 78%
- ≈ 75%
- Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMsWenqiu Tang and Zhen Wan and Takahiro Komamizu and Ichiro Ide2026≈ 75%
- ≈ 75%
- Creating user stereotypes for persona development from qualitative data through semi-automatic subspace clusteringThomas Bjorner, Pernille Krog Sorensen, Paolo Burelli Dannie Korsgaard2026≈ 75%
- BILLY: Steering Large Language Models via Merging Persona Vectors for Creative GenerationJui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-Yi Lee, Kai-Wei Chang Tsung-Min Pai2026≈ 74%
- Open Character Training: Shaping the Persona of AI Assistants through Constitutional AIHenning Bartsch, Nathan Lambert, Evan Hubinger Sharan Maiya2025≈ 74%
- The Power of Personality: A Human Simulation Perspective to Investigate Large Language Model AgentsYihong Tang, Xuefeng Bai, Kehai Chen, Juntao Li, Min Zhang Yifan Duan2025≈ 73%
- The Effectiveness of Style Vectors for Steering Large Language Models: A Human EvaluationKatharina Dworatzyk, Sophie Jentzsch, Peer Sch\"utt, Sabine Theis, Tobias Hecking Diaoul\'e Diallo2026≈ 72%
- Controllable and explainable personality sliders for LLMs at inference timeDavid Khachaturov, Robert Mullins, Mark Huasong Meng Florian Hoppe2026≈ 72%
- Evaluating Large Language Models with PsychometricsYue Huang, Hongyi Wang, Ying Cheng, Xiangliang Zhang, James Zou, Lichao Sun Yuan Li2025≈ 71%
- Steering at the Source: Style Modulation Heads for Robust Persona ControlGouki Minegishi, Koshi Eguchi, Sosuke Hosokawa, Kenjiro Taura Yoshihiro Izawa2026≈ 71%
- ≈ 71%
- What Can We Actually Steer? A Multi-Behavior Study of Activation ControlKrystian Novak Tetiana Bas2026≈ 71%
- Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMsFan Yang, Shaunak A. Mehta, Koichi Onoue Wenkai Li2026≈ 71%
- "You tell me": A Dataset of GPT-4-Based Behaviour Change Support ConversationsSelina Meyer and David Elsweiler2026≈ 71%
- Psychological Steering of Large Language Modelsin corpus2026≈ 69%
- ≈ 68%
- ≈ 68%
- Quantitative Introspection in Language Models: Tracking Emotive States Across Conversationin corpus2026≈ 68%
- ≈ 67%
- Interpreting Language Model Parametersin corpus2026≈ 67%
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencodersin corpus2026≈ 67%
- ≈ 67%
- Anima Labs Phenomenology Pt1in corpus≈ 66%
- ≈ 66%
- ≈ 65%
- ≈ 65%
Similar preprints — Semantic Scholar
Cited by (3)
- Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders
Sparse Autoencoder (SAE)-based contrastive feature retrieval can reliably identify and bidirectionally steer high-order semantic features in LLMs, outperforming Contrastive Activation Addition (CAA) i
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering — intervening on model activations along paths constrained to lie on a learned activation manifold M_h rather than along Euclidean linear directions — produces behavioral trajectorie
- Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
Probe-based data attribution, introduced here as a method for surfacing and mitigating undesirable post-training behaviors, reduces harmful compliance in OLMo 2 7B by 63% through datapoint filtering a