finding
active
finding:ali-et-al-2025-found-contrastive-activation-addition-less-effective-at-larger-model-scale-consistent-with-esr-in-70bAli et al. 2025 found contrastive activation addition less effective at larger model scale, consistent with ESR in 70B
Prior finding from related work that aligns with ESR being strongest in the largest model tested
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Neighborhood — ranked by edge-count
Claims (1)
claim
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferencesupportsCentral interpretive claim of the paper supported by causal ablation and activation evidence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Performance gains over CAA in steering tasks.
- Random latent ablation produces slight increase in ESR rate (3.8% to 4.2%), not statistically significantfinding0.786Control result confirming OTD ablation effect is specific to those latents, not a general ablation artifact
- Characterizes the narrow operating window in which ESR can manifest
- Causal interpretation of the ablation experiment results
- Acknowledges incompleteness of the causal account, suggesting redundant circuits or nonlinear interactions
- Systematic comparison showing features are substantially more universal than neurons across models
- An existing activation steering method used as comparative baseline.