finding
active
finding:optimal-activation-capping-layers-for-llama-3-3-70b-are-layers-56-71-out-of-80-at-25th-percentile-capOptimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile cap
Specific implementation finding for Llama capping parameters
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Neighborhood — ranked by edge-count
Methods (1)
method
- Activation CappingsupportsClamping activations along the Assistant Axis to remain above a minimum threshold (25th percentile), introduced as a stabilization method
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Optimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile capfinding0.874Specific implementation finding for Qwen capping parameters
- LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.809Seed-pooled geometry-only statistics (per-dev z units).
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.802Task-specific E3 finding showing compositional reasoning requires deeper processing
- Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
- E3 result establishing the Goldilocks zone at mid-layers for LLaMA architecture
- Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
- Unexpected positive finding suggesting capping may sometimes help capabilities
- Calibration finding for choosing the activation cap threshold