finding
active
finding:optimal-activation-capping-layers-for-qwen-3-32b-are-layers-46-53-out-of-64-at-25th-percentile-capOptimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile cap
Specific implementation finding for Qwen capping parameters
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Methods (1)
method
- Activation CappingsupportsClamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Optimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile capfinding0.874Specific implementation finding for Llama capping parameters
- Calibration finding for choosing the activation cap threshold
- Unexpected positive finding suggesting capping may sometimes help capabilities
- Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
- E3 result establishing the Goldilocks zone at mid-layers for LLaMA architecture
- Qwen 35B (3B active params, score 4.38) outscores Hermes 405B (405B active params, score 1.75) by 2.5xfinding0.742Parameters don't predict scores; 135x more parameters yields 60% lower score
- Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit
- Qualitative case study demonstrating AI psychosis pattern and capping mitigation