finding

active

finding:gemma-3-4b-it-shows-three-stage-layer-trajectory-and-s-l-peak-despite-scale-differences-in-dr-and-d

Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρd

E3 backbone generalization finding for Gemma; validates pattern across diverse architectures

Source paper

extracted_from

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring

(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang

Neighborhood — ranked by edge-count

Concepts (1)

concept

Three-Stage Layer Trajectory
supports
Empirically observed pattern in E3: early enrichment (ρd dips), mid-layer alignment (dr falls), late standardization (re-clustering)

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

For simple factual tasks F0-F3, probe directions show a sharp geometric transition in middle layers, with late-layer probes converging to high cosine similarity; A3 and F4-F5 show no clear transition.finding0.788
Geometric evidence for convergence to stable truth directions only for simpler tasks.
In Gemma-2-9B, only the first cone axis (v1) has non-negligible cosine similarity to the DIM direction; all other axes have near-zero similarity (~1e-9)finding0.778
Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.773
Experiment 1 finding localizing where truth can be causally mediated
Gemma 3 4B wellbeing probe: peak Cohen's d=1.8finding0.772
Weaker cross-family probe; explains weaker introspection in Gemma
Do layer-wise geometric signatures (τ_peak, AUS_N) correlate with behavioral thresholds (k50)?question0.762
E3 research question testing whether internal representations provide a geometry-to-behavior bridge
Gemma-2-27B attention layer Latent SOO MSE reduced from 11 to 7.67 ± 0.77 after SOO fine-tuningfinding0.760
SOO fine-tuning reduced attention layer MSE in Gemma-2-27B though MLP layers showed no significant change
The difficulty boundary for truth directions replicates across all four tested models (Llama-3.2-3B, Llama-3.1-8B, Gemma-2-2b, Gemma-2-9b); generalization to F3-F5 remains consistently low regardless of model size or family.finding0.760
Establishes generalizability of the core difficulty-boundary finding across model families.
Gemma-2-27B Perspectives accuracy remains 100% after SOO fine-tuningfinding0.759
SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking