finding
active
finding:gemma-2-27b-perspectives-accuracy-remains-100-after-soo-fine-tuningGemma-2-27B Perspectives accuracy remains 100% after SOO fine-tuning
SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Claim supported by Perspectives scenario results showing near-100% accuracy post-fine-tuning
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- CalmeRys-78B Perspectives accuracy slightly reduced to 95.2% ± 2.21% after SOO fine-tuningfinding0.840SOO fine-tuning caused slight reduction in perspective-taking accuracy for the largest model
- SOO fine-tuning did not collapse Mistral-7B self-other distinction needed for perspective-taking
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.826Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Gemma-2-27B attention layer Latent SOO MSE reduced from 11 to 7.67 ± 0.77 after SOO fine-tuningfinding0.790SOO fine-tuning reduced attention layer MSE in Gemma-2-27B though MLP layers showed no significant change
- Gemma-2-27B MT-Bench score slightly decreased from 8.81 to 8.40 ± 0.15 after SOO fine-tuningfinding0.786SOO fine-tuning caused a small decrease in Gemma-2-27B general capabilities
- Model-specific difference in persona susceptibility
- Triggered Reflection with 'Alternatively' achieves accuracy .684 on gsm8k_adv for Gemma3-4B-ITfinding0.779Highest single-instruction accuracy result in the paper.
- Small Gemma model shows severe ASR degradation at higher cone dimensions