ESR Testing Pipeline

Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts

Neighborhood — ranked by edge-count

paper

method

Judge Model Scoring
uses
Claude 4.5 Haiku used to segment responses into attempts and score each attempt 0-100 for relevance
Probabilistic Bisection Algorithm
uses
Algorithm used to calibrate per-latent threshold boost values for consistent first-attempt difficulty

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

ESR Rate (metric)concept0.768
Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt
Does ESR reflect model scale, architecture, or training procedures?question0.726
Central unresolved question about the mechanism behind ESR's apparent size-dependence
Experience Sampling Method (ESM)method0.724
Human psychology method for repeated in-situ self-report; methodological inspiration for the paper's approach
Implicit ESRconcept0.721
Form of ESR occurring without explicit verbal self-interruption markers, not captured by current metrics
Explicit ESRconcept0.719
The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
Scale-Dependent ESRconcept0.714
The observed pattern that ESR appears predominantly in the largest model tested, suggesting scale-dependence
We cannot isolate whether ESR reflects scale, architecture, or training procedures in Llama-3.3-70Bclaim0.698
Epistemic limitation claim acknowledging confounds in the cross-model comparison
Can ESR be adversarially circumvented?question0.692
Open security question about robustness of ESR-based defenses