claim
active
claim:the-26-differentially-activated-otd-latents-play-a-causally-important-role-in-enabling-esr-in-llama-3-3-70bThe 26 differentially-activated OTD latents play a causally important role in enabling ESR in Llama-3.3-70B
Causal interpretation of the ablation experiment results
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (4)
finding
- Ablating 26 OTD latents reduces multi-attempt rate by 25% (from 7.4% to 5.5%) in Llama-3.3-70Bassociated_withsupportsPrimary causal evidence for dedicated internal consistency-checking circuits
- Temporal pattern consistent with internal monitoring process preceding explicit self-correction
- Control result confirming OTD ablation effect is specific to those latents, not a general ablation artifact
- Evidence that OTDs specifically support meta-cognitive monitoring rather than general response generation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Acknowledges incompleteness of the causal account, suggesting redundant circuits or nonlinear interactions
- Prior finding from related work that aligns with ESR being strongest in the largest model tested
- Reveals that contrastive search yields a heterogeneous set, not all functioning as true off-topic detectors
- We cannot isolate whether ESR reflects scale, architecture, or training procedures in Llama-3.3-70Bclaim0.761Epistemic limitation claim acknowledging confounds in the cross-model comparison
- 26 candidate off-topic detector latents identified in Llama-3.3-70B via contrastive searchfinding0.758Core mechanistic finding identifying specific SAE latents associated with ESR
- Characterizes the narrow operating window in which ESR can manifest
- Meta-prompt ESR enhancement effects scale with model size across Llama and Gemma familiesfinding0.747Suggests underlying self-monitoring circuits must be present for meta-prompting to enhance them
- Empirical demonstration that MDVP produces divergent representations in a real LLM