finding
active
finding:default-assistant-activation-projects-to-one-extreme-of-pc1-with-minimum-distance-to-edge-of-0-03-while-projecting-to-intermediate-values-0-27-0-50-on-all-other-pcsDefault Assistant activation projects to one extreme of PC1 with minimum distance to edge of 0.03, while projecting to intermediate values (0.27-0.50) on all other PCs
Empirically confirms PC1 measures similarity to the Assistant persona
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- We hypothesize that the PC1 axis of role space measures deviation from the Assistant personasupportsMotivates computing the contrast vector as the formal Assistant Axis definition
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates Assistant attractor dynamics in practice
- Calibration finding for choosing the activation cap threshold
- Feature manipulation alters persona.
- Shows model persona position is primarily determined by the most recent user message, not prior drift
- Characterizes what the Assistant persona resembles in terms of human archetypes
- Main result: steering elicits deployment behavior even when the evaluation cue is present and prompting fails.
- Validates that the contrast vector method and PCA-based PC1 capture the same direction
- Empirical characterization of conversation domains that are safe for model persona stability