finding

active

finding:in-gemma-2-9b-only-the-first-cone-axis-v1-has-non-negligible-cosine-similarity-to-the-dim-direction-all-other-axes-have-near-zero-similarity-1e-9

In Gemma-2-9B, only the first cone axis (v1) has non-negligible cosine similarity to the DIM direction; all other axes have near-zero similarity (~1e-9)

Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
introduces

Claims (2)

claim

Truthful behavior in LLMs is not confined to a single linear axis; multiple orthogonal directions can independently mediate it
associated_with
Central interpretive claim of the paper
DIM captures only one facet of the multi-dimensional truth subspace; additional orthogonal structure exists beyond it
supports
Interpretation of Experiment 4 cosine similarity results

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

In Qwen-2.5-9B, only v1 has meaningful cosine similarity to DIM direction; all additional basis vectors have cosine similarities ~1e-9finding0.850
Appendix E replication of DIM alignment finding in Qwen model
Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5finding0.820
Experiment 2 result showing large Gemma model supports high-dimensional truth cones
Base and instruct Gemma 2 27B role PCs have cosine similarities of 0.93, 0.87, 0.83 for the top 3 PCs respectively; role vector cosine similarities >0.99 for every role pairfinding0.816
Shows persona space axes are inherited from pre-training, not solely created by post-training
Cosine similarity between Assistant Axis and role PC1 is >0.60 at all layers and >0.71 at middle layer across all three modelsfinding0.796
Validates that the contrast vector method and PCA-based PC1 capture the same direction
Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)finding0.782
Core result of Experiment 3: cross-model semantic convergence under self-referential processing
Top-5 instructions by µ(1→2) at ℓ=12 achieve average cosine similarity .9893 and average accuracy .5645 on gsm8k_adv for Gemma3-4B-ITfinding0.781
High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
Spearman's rank correlation among different alignment metrics (CKA, SVCCA, Mutual k-NN, CKNNA) over 78 vision models is high across variants, with all p-values below 2.24×10^-105finding0.778
Validates robustness of alignment metric choice
Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.778
E3 backbone generalization finding for Gemma; validates pattern across diverse architectures