finding

active

finding:optimal-activation-capping-layers-for-llama-3-3-70b-are-layers-56-71-out-of-80-at-25th-percentile-cap

Optimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile cap

Specific implementation finding for Llama capping parameters

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Methods (1)

method

Activation Capping
supports
Clamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Optimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile capfinding0.874
Specific implementation finding for Qwen capping parameters
LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.809
Seed-pooled geometry-only statistics (per-dev z units).
Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.802
Task-specific E3 finding showing compositional reasoning requires deeper processing
The case at approximately the 2/3 layer of LLaMA3.1-8B (Layer 24, satisfying Criteria 1 and 2) aligns with prior studies showing the 2/3 layer optimally predicts human brain activity.finding0.798
Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
Meta-LLaMA-3.1-8B-Instruct shows optimal anchoring at layer 9 (S ≈ −1.90, median peak layer ℓ* = 10 [IQR 0.384])finding0.789
E3 result establishing the Goldilocks zone at mid-layers for LLaMA architecture
Llama-3.3-70B shows multi-attempt rate of 7.4% vs. ≤1.2% for all other models testedfinding0.779
Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
Some activation capping settings slightly improve performance on IFEval, MMLU Pro, or GSM8k for both Qwen and Llamafinding0.779
Unexpected positive finding suggesting capping may sometimes help capabilities
25th percentile of Assistant Axis projection distribution gives the most Pareto-optimal safety-capability tradeoff for activation capping, and approximately matches mean Assistant response activationfinding0.778
Calibration finding for choosing the activation cap threshold