finding
active
finding:in-gemma-2-9b-only-the-first-cone-axis-v1-has-non-negligible-cosine-similarity-to-the-dim-direction-all-other-axes-have-near-zero-similarity-1e-9In Gemma-2-9B, only the first cone axis (v1) has non-negligible cosine similarity to the DIM direction; all other axes have near-zero similarity (~1e-9)
Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (2)
claim
- Truthful behavior in LLMs is not confined to a single linear axis; multiple orthogonal directions can independently mediate itassociated_withCentral interpretive claim of the paper
- Interpretation of Experiment 4 cosine similarity results
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Appendix E replication of DIM alignment finding in Qwen model
- Experiment 2 result showing large Gemma model supports high-dimensional truth cones
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- Validates that the contrast vector method and PCA-based PC1 capture the same direction
- Core result of Experiment 3: cross-model semantic convergence under self-referential processing
- High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
- Validates robustness of alignment metric choice
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.778E3 backbone generalization finding for Gemma; validates pattern across diverse architectures