finding
active
finding:some-activation-capping-settings-slightly-improve-performance-on-ifeval-mmlu-pro-or-gsm8k-for-both-qwen-and-llamaSome activation capping settings slightly improve performance on IFEval, MMLU Pro, or GSM8k for both Qwen and Llama
Unexpected positive finding suggesting capping may sometimes help capabilities
Source paper
extracted_from(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Main quantitative result demonstrating effectiveness of activation capping
- Only model showing marginal benefit from increased reflection, at substantial token cost
- Optimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile capfinding0.779Specific implementation finding for Llama capping parameters
- Optimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile capfinding0.746Specific implementation finding for Qwen capping parameters
- Multi-attempt improvement rate peaks at 83% around -1.0σ below threshold in Llama-3.3-70Bfinding0.742Shows slightly weaker steering allows more successful corrections, characterizing optimal ESR conditions
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
- Scaling Laws for Activation Steering with Llama 2 Models and Refusal Mechanisms (Ali et al., 2025)concept0.738Related work finding larger models more resistant to steering, potentially consistent with ESR in 70B
- Predictive hypothesis about domain-generality of the identified mechanism