finding
active
finding:multi-attempt-improvement-rate-peaks-at-83-around-1-0-below-threshold-in-llama-3-3-70bMulti-attempt improvement rate peaks at 83% around -1.0σ below threshold in Llama-3.3-70B
Shows slightly weaker steering allows more successful corrections, characterizing optimal ESR conditions
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
- Demonstrates ESR can be deliberately enhanced through prompting in the largest model
- Shows behavioral pattern of self-correction is trainable in smaller models
- Quantitative characterization of ESR operating regime in boost level sweep
- Ablating 26 OTD latents reduces multi-attempt rate by 25% (from 7.4% to 5.5%) in Llama-3.3-70Bfinding0.812Primary causal evidence for dedicated internal consistency-checking circuits
- Illustrative finding that ESR mitigates but does not fully eliminate steering influence
- LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)finding0.797Seed-pooled geometry-only statistics (per-dev z units).
- Evidence that OTDs specifically support meta-cognitive monitoring rather than general response generation