finding
active
finding:gemma-3-1b-it-yields-only-one-valid-mds-injection-score-phi-1-a-up-4-8-and-is-excluded-from-main-analysesgemma-3-1b-it yields only one valid MDS injection score (phi_1,A,up = 4.8) and is excluded from main analyses
Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Unexplained exception identified as a limitation and open question
- Per-model steerability comparison from Table 4
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.767Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.764Primary quantitative result overturning prior reports that prompting outperforms representation engineering
- Weaker cross-family probe; explains weaker introspection in Gemma
- Mechanistic explanation for discrepancy with Banayeeanzade et al.; addressed by centroid unit and unbounded sweep contributions
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.750E3 backbone generalization finding for Gemma; validates pattern across diverse architectures