finding
active
finding:gemma-2-2b-asr-drops-from-100-at-dims-1-2-to-43-1-at-dim-4-and-27-1-at-dim-5Gemma-2-2B ASR drops from 100% at dims 1–2 to 43.1% at dim 4 and 27.1% at dim 5
Small Gemma model shows severe ASR degradation at higher cone dimensions
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of ASR degradation patterns by model size across cone dimensions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
- Experiment 2 result showing large Gemma model supports high-dimensional truth cones
- Establishes potential Llama-family specificity or scale specificity of ESR phenomenon
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.817Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Gemma-2-27B average generalization deceptive rate reduced from 98.4% ± 1.55% to 9.94% ± 6.83%finding0.787SOO fine-tuning generalized across 7 scenario variants for Gemma-2-27B
- Gemma-2-27B MT-Bench score slightly decreased from 8.81 to 8.40 ± 0.15 after SOO fine-tuningfinding0.786SOO fine-tuning caused a small decrease in Gemma-2-27B general capabilities
- SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
- Shows persona space axes are inherited from pre-training, not solely created by post-training