finding
active
finding:pairwise-similarity-of-trait-pc1-across-all-three-models-is-0-81-no-pairwise-correlation-in-top-3-trait-pcs-is-below-0-70Pairwise similarity of trait PC1 across all three models is >0.81; no pairwise correlation in top 3 trait PCs is below 0.70
Shows trait space has more cross-model consistency than role space beyond PC1
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows the leading component of persona space is model-universal
- Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition
- SAE features are not simply mirroring individual neurons.
- Validates that the contrast vector method and PCA-based PC1 capture the same direction
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- Strength comparison accuracy reaches 73% at layer 3 for injection pair (2,6) vs. 50% chancefinding0.756Secondary positive result for strength comparison showing graded sensitivity to perturbation magnitude
- Shows that introspective accuracy scales with injection strength difference, not binary detection
- Corroborates role space findings using traits; shows PC1 also captures Assistant-ness in trait space