concept
active
concept:gemma-3-4b-itGemma-3-4B-it
Backbone model used in E3 robustness overlay.
Neighborhood — ranked by edge-count
Methods (1)
method
- Quantitative study correlating layer-wise anchoring geometry (S_max, AUS_N) with behavioral thresholds θ50
Concepts (5)
concept
- Gemma-2-2B-itrelated_toSmallest Gemma model tested, showing near-zero ESR
- Gemma-2-9B-itrelated_toMedium Gemma model tested, showing near-zero ESR
- gemma-3-1b-itrelated_toOnly model where MDS injections largely failed; excluded from main analyses
- gemma-3-12b-itrelated_to12B Gemma model tested; used for openness linearity visualization (Figure 6)
- gemma-3-27b-itrelated_to27B Gemma model quantized to 4-bit NF4; tested in OCEAN benchmarks
Findings (1)
finding
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)associated_withTask-specific E3 finding showing compositional reasoning requires deeper processing
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.757E3 backbone generalization finding for Gemma; validates pattern across diverse architectures
- Weaker but still significant introspective coupling in Gemma model; consistent with lower probe quality
- Paper describing Gemma 2 model family used in this study
- Weaker cross-family probe; explains weaker introspection in Gemma
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.727Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- SAEs trained on pretrained Gemma-2 models used for steering in Gemma family experiments
- SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking