method
active
method:pca-on-persona-spacePCA on Persona Space
Standardized PCA run on role vectors to find main axes of persona variation
Neighborhood — ranked by edge-count
Methods (1)
method
- Pipeline for extracting mean post-MLP residual stream activations from model responses under persona-specific system prompts to produce role vectors
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Low-dimensional space of activation directions corresponding to diverse character archetypes in LLMs
- Statistical method used to analyze neural activity data.
- Used to visually inspect separation of truth-related directions in model activation space across layers
- PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
- Justifies PCA choice over UMAP or t-SNE for the node-structured RN model.
- Limitation question motivating future work on persona elicitation strategies
- We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.698Motivates computing the contrast vector as the formal Assistant Axis definition
- Keeping a model anchored to its intended persona during deployment, preventing drift to harmful behaviors