finding

active

finding:gemma-2-2b-asr-drops-from-100-at-dims-1-2-to-43-1-at-dim-4-and-27-1-at-dim-5

Gemma-2-2B ASR drops from 100% at dims 1–2 to 43.1% at dim 4 and 27.1% at dim 5

Small Gemma model shows severe ASR degradation at higher cone dimensions

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Claims (1)

claim

Larger models can support higher-dimensional truth cones than smaller models
supports
Interpretation of ASR degradation patterns by model size across cone dimensions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5finding0.863
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5finding0.856
Experiment 2 result showing large Gemma model supports high-dimensional truth cones
All three Gemma-2 models show ESR rates below 1%, near indistinguishable from zerofinding0.832
Establishes potential Llama-family specificity or scale specificity of ESR phenomenon
Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.817
Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
Gemma-2-27B average generalization deceptive rate reduced from 98.4% ± 1.55% to 9.94% ± 6.83%finding0.787
SOO fine-tuning generalized across 7 scenario variants for Gemma-2-27B
Gemma-2-27B MT-Bench score slightly decreased from 8.81 to 8.40 ± 0.15 after SOO fine-tuningfinding0.786
SOO fine-tuning caused a small decrease in Gemma-2-27B general capabilities
Gemma-2-27B Perspectives accuracy remains 100% after SOO fine-tuningfinding0.776
SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
Base and instruct Gemma 2 27B role PCs have cosine similarities of 0.93, 0.87, 0.83 for the top 3 PCs respectively; role vector cosine similarities >0.99 for every role pairfinding0.776
Shows persona space axes are inherited from pre-training, not solely created by post-training