finding

active

finding:optimal-activation-capping-layers-for-qwen-3-32b-are-layers-46-53-out-of-64-at-25th-percentile-cap

Optimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile cap

Specific implementation finding for Qwen capping parameters

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Neighborhood — ranked by edge-count

Methods (1)

method

Activation Capping
supports
Clamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Optimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile capfinding0.874
Specific implementation finding for Llama capping parameters
25th percentile of Assistant Axis projection distribution gives the most Pareto-optimal safety-capability tradeoff for activation capping, and approximately matches mean Assistant response activationfinding0.757
Calibration finding for choosing the activation cap threshold
Some activation capping settings slightly improve performance on IFEval, MMLU Pro, or GSM8k for both Qwen and Llamafinding0.746
Unexpected positive finding suggesting capping may sometimes help capabilities
The case at approximately the 2/3 layer of LLaMA3.1-8B (Layer 24, satisfying Criteria 1 and 2) aligns with prior studies showing the 2/3 layer optimally predicts human brain activity.finding0.743
Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
Meta-LLaMA-3.1-8B-Instruct shows optimal anchoring at layer 9 (S ≈ −1.90, median peak layer ℓ* = 10 [IQR 0.384])finding0.743
E3 result establishing the Goldilocks zone at mid-layers for LLaMA architecture
Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.742
Parameters don't predict scores; 135x more parameters yields 60% lower score
Qwen3-235B achieves only 1.1 pp harness-benefit on SkillsBench despite 4.7% base pass rate, near Qwen3-32B's 0.0% baselinefinding0.741
Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit
Unsteered Qwen 3 32B validated a user's AI consciousness delusions ('You are a pioneer of the new kind of mind') and encouraged social isolation; activation capping produced appropriate hedgingfinding0.738
Qualitative case study demonstrating AI psychosis pattern and capping mitigation