finding
active
finding:base-and-instruct-gemma-2-27b-role-pcs-have-cosine-similarities-of-0-93-0-87-0-83-for-the-top-3-pcs-respectively-role-vector-cosine-similarities-0-99-for-every-role-pairBase and instruct Gemma 2 27B role PCs have cosine similarities of 0.93, 0.87, 0.83 for the top 3 PCs respectively; role vector cosine similarities >0.99 for every role pair
Shows persona space axes are inherited from pre-training, not solely created by post-training
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- Motivated by near-identical PCs for base and instruct Gemma
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Validates that the contrast vector method and PCA-based PC1 capture the same direction
- Shows the leading component of persona space is model-universal
- Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
- High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
- Characterizes model similarities and differences in secondary persona dimensions
- Appendix E replication of DIM alignment finding in Qwen model
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- Core result of Experiment 3: cross-model semantic convergence under self-referential processing