finding
active
finding:random-latent-ablation-produces-slight-increase-in-esr-rate-3-8-to-4-2-not-statistically-significantRandom latent ablation produces slight increase in ESR rate (3.8% to 4.2%), not statistically significant
Control result confirming OTD ablation effect is specific to those latents, not a general ablation artifact
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Causal interpretation of the ablation experiment results
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Acknowledges incompleteness of the causal account, suggesting redundant circuits or nonlinear interactions
- Prior finding from related work that aligns with ESR being strongest in the largest model tested
- Ablating 26 OTD latents reduces multi-attempt rate by 25% (from 7.4% to 5.5%) in Llama-3.3-70Bfinding0.786Primary causal evidence for dedicated internal consistency-checking circuits
- Control experiment ablating random latents matched for activation frequency and magnitude to test OTD specificity
- Evidence that OTDs specifically support meta-cognitive monitoring rather than general response generation
- Gradient-based attribution approximates ablation impact, enabling fast search for causally important features.
- Characterizes the narrow operating window in which ESR can manifest
- Distinguishes ESR from prior work on model self-repair