finding

active

finding:some-activation-capping-settings-slightly-improve-performance-on-ifeval-mmlu-pro-or-gsm8k-for-both-qwen-and-llama

Some activation capping settings slightly improve performance on IFEval, MMLU Pro, or GSM8k for both Qwen and Llama

Unexpected positive finding suggesting capping may sometimes help capabilities

Source paper

extracted_from

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

(2026) · Christina Lu · Jack Gallagher · Jonathan Michala · Kyle Fish +1

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Activation capping reduces harmful response rate by nearly 60% without impacting performance on IFEval, MMLU Pro, GSM8k, and EQ-Benchfinding0.833
Main quantitative result demonstrating effectiveness of activation capping
DeepSeek-R1 Llama 8b gains 0.16% accuracy on GSM8k with positive intervention (more reflections) at cost of ~2000 additional tokensfinding0.779
Only model showing marginal benefit from increased reflection, at substantial token cost
Optimal activation capping layers for Llama 3.3 70B are layers 56-71 (out of 80) at 25th percentile capfinding0.779
Specific implementation finding for Llama capping parameters
Optimal activation capping layers for Qwen 3 32B are layers 46-53 (out of 64) at 25th percentile capfinding0.746
Specific implementation finding for Qwen capping parameters
Multi-attempt improvement rate peaks at 83% around -1.0σ below threshold in Llama-3.3-70Bfinding0.742
Shows slightly weaker steering allows more successful corrections, characterizing optimal ESR conditions
Layer 24 (indexed at 8) of LLaMA3.1-8B on Hinting satisfies Criteria 1 and 2 under both IIT 3.0 and IIT 4.0 (temporal permutation).finding0.739
One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
Scaling Laws for Activation Steering with Llama 2 Models and Refusal Mechanisms (Ali et al., 2025)concept0.738
Related work finding larger models more resistant to steering, potentially consistent with ESR in 70B
We hypothesize that Llama-3.1-8B deploys the same base-10 addition circuitry for cyclic reasoning as it uses for general arithmetic, independent of the concept domainhypothesis0.735
Predictive hypothesis about domain-generality of the identified mechanism