Gemma-2-2B-it

Smallest Gemma model tested, showing near-zero ESR

Neighborhood — ranked by edge-count

paper

concept

Gemma-2-9B-it
related_to
Medium Gemma model tested, showing near-zero ESR
Gemma-3-4B-it
related_to
Backbone model used in E3 robustness overlay.
gemma-3-1b-it
related_to
Only model where MDS injections largely failed; excluded from main analyses
gemma-3-12b-it
related_to
12B Gemma model tested; used for openness linearity visualization (Figure 6)
gemma-3-27b-it
related_to
27B Gemma model quantized to 4-bit NF4; tested in OCEAN benchmarks

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Gemma-2-2B ASR drops from 100% at dims 1–2 to 43.1% at dim 4 and 27.1% at dim 5finding0.746
Small Gemma model shows severe ASR degradation at higher cone dimensions
Gemma 2 27B is unlikely to take on human personas when steered away from Assistant, preferring nonhuman or theatrical portrayalsfinding0.739
Model-specific difference in persona susceptibility
Gemma 2: Improving Open Language Models at a Practical Size (Team et al., 2024)concept0.736
Paper describing Gemma 2 model family used in this study
GemmaScope SAEsconcept0.711
SAEs trained on pretrained Gemma-2 models used for steering in Gemma family experiments
Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.710
Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.694
E3 backbone generalization finding for Gemma; validates pattern across diverse architectures
Gemma-2-27B Perspectives accuracy remains 100% after SOO fine-tuningfinding0.694
SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking
Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5finding0.693
Experiment 2 result showing large Gemma model supports high-dimensional truth cones