finding
active
finding:steering-base-models-toward-the-assistant-axis-increases-agreeableness-traits-friendly-kind-helpful-and-decreases-extraversion-in-gemma-and-openness-in-llama

Steering base models toward the Assistant Axis increases agreeableness traits (friendly, kind, helpful) and decreases extraversion in Gemma and openness in Llama

Characterizes the trait content of the Assistant Axis in pre-trained models

Source paper

extracted_from
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.