finding
active
finding:gemma-2-9b-achieves-near-100-asr-97-3-100-across-all-cone-dimensions-1-5Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5
Experiment 2 result showing large Gemma model supports high-dimensional truth cones
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Claims (2)
claim
- Central interpretive claim of the paper
- Interpretation of ASR degradation patterns by model size across cone dimensions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Experiment 2 result showing large models can support high-dimensional truth cones
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
- Establishes potential Llama-family specificity or scale specificity of ESR phenomenon
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.791Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
- Gemma-2-27B attention layer Latent SOO MSE reduced from 11 to 7.67 ± 0.77 after SOO fine-tuningfinding0.768SOO fine-tuning reduced attention layer MSE in Gemma-2-27B though MLP layers showed no significant change