question
active
question:how-does-different-post-training-data-shift-a-model-s-position-along-persona-dimensionsHow does different post-training data shift a model's position along persona dimensions?
Future work direction: using persona space to study effects of training data on model character
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Load-bearing summary of the paper's core finding about persona stability
- Central interpretive claim and motivation for future work
- Limitation question motivating future work on persona elicitation strategies
- Motivated by near-identical PCs for base and instruct Gemma
- What if the concept being manipulated does not lie on a straight line in the model's representations?question0.770The motivating question that opens the paper and leads to the development of manifold steering.
- Key mechanistic claim about the developmental origin of the Assistant persona
- Can off-the-rails model behavior be attributed to their persona drifting from the Assistant?question0.769Motivates the multi-turn conversation drift experiments in §4
- Finding that base models have high false positives and no net positive performance.