PCA on Persona Space

Standardized PCA run on role vectors to find main axes of persona variation

Neighborhood — ranked by edge-count

method

Role Vector Extraction
uses
Pipeline for extracting mean post-MLP residual stream activations from model responses under persona-specific system prompts to produce role vectors

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Persona Spaceconcept0.826
Low-dimensional space of activation directions corresponding to diverse character archetypes in LLMs
Principal components analysis (PCA)method0.765
Statistical method used to analyze neural activity data.
PCA Visualizationmethod0.730
Used to visually inspect separation of truth-related directions in model activation space across layers
PCA of Emotion Feature Activationsmethod0.719
PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
PCA is the appropriate dimensionality reduction technique for constructing the RN because it preserves global structure and provides deterministic, interpretable projections.claim0.704
Justifies PCA choice over UMAP or t-SNE for the node-structured RN model.
What dimensions of persona are not captured by our extracted role vectors, and how complete is the current persona space mapping?question0.704
Limitation question motivating future work on persona elicitation strategies
We hypothesize that the PC1 axis of role space measures deviation from the Assistant personahypothesis0.698
Motivates computing the contrast vector as the formal Assistant Axis definition
Persona Stabilizationconcept0.687
Keeping a model anchored to its intended persona during deployment, preventing drift to harmful behaviors