method
active
method:esr-testing-pipelineESR Testing Pipeline
Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (2)
method
- Claude 4.5 Haiku used to segment responses into attempts and score each attempt 0-100 for relevance
- Algorithm used to calibrate per-latent threshold boost values for consistent first-attempt difficulty
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt
- Central unresolved question about the mechanism behind ESR's apparent size-dependence
- Human psychology method for repeated in-situ self-report; methodological inspiration for the paper's approach
- Form of ESR occurring without explicit verbal self-interruption markers, not captured by current metrics
- The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
- The observed pattern that ESR appears predominantly in the largest model tested, suggesting scale-dependence
- We cannot isolate whether ESR reflects scale, architecture, or training procedures in Llama-3.3-70Bclaim0.698Epistemic limitation claim acknowledging confounds in the cross-model comparison
- Open security question about robustness of ESR-based defenses