finding
active
finding:all-five-judge-models-consistently-rank-llama-3-3-70b-as-having-substantially-higher-esr-rates-than-other-modelsAll five judge models consistently rank Llama-3.3-70B as having substantially higher ESR rates than other models
Cross-judge validation of the primary ESR finding across OpenAI, Alibaba, Anthropic, and Google judge models
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (2)
claim
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferencesupportsCentral interpretive claim of the paper supported by causal ablation and activation evidence
- We cannot isolate whether ESR reflects scale, architecture, or training procedures in Llama-3.3-70BsupportsEpistemic limitation claim acknowledging confounds in the cross-model comparison
Concepts (1)
concept
- Primary model of interest showing substantial ESR; largest model tested in the study
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
- Characterizes the narrow operating window in which ESR can manifest
- Larger models linearly represent more general concepts including truth
- Illustrative finding that ESR mitigates but does not fully eliminate steering influence
- Replication across open-weight models supports scale-emergence finding
- Establishes potential Llama-family specificity or scale specificity of ESR phenomenon
- Only model showing marginal benefit from increased reflection, at substantial token cost
- Greedy-decoded self-reports in LLaMA-3.2-3B collapse to 1.1–3.9 distinct values on a 10-point scalefinding0.781Demonstrates that default decoding masks introspective capacity; entropy 0.03–1.10 bits