hypothesis
active
hypothesis:we-hypothesize-that-axes-of-persona-differentiation-within-llms-are-likely-already-present-in-base-models-and-inherited-from-the-pre-training-corpusWe hypothesize that axes of persona differentiation within LLMs are likely already present in base models and inherited from the pre-training corpus
Motivated by near-identical PCs for base and instruct Gemma
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Findings (1)
finding
- Shows persona space axes are inherited from pre-training, not solely created by post-training
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- How does different post-training data shift a model's position along persona dimensions?question0.800Future work direction: using persona space to study effects of training data on model character
- Key mechanistic claim about the developmental origin of the Assistant persona
- Limitation question motivating future work on persona elicitation strategies
- Primary empirical claim of the paper
- Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- The paper's strong claim that there is no underlying authentic agent behind the simulator, only layers of role play
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.