concept
active
concept:esr-rate-metric

ESR Rate (metric)

Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs

Conceptual bridges

2-hop · via this concept's ideas

Where ideas in this concept connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts
  • The observed pattern that ESR appears predominantly in the largest model tested, suggesting scale-dependence
  • Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction
  • Reflection rateconcept0.742
    Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
  • Key evaluation metric: proportion of inputs for which an intervention successfully flips model output
  • Implicit ESRconcept0.729
    Form of ESR occurring without explicit verbal self-interruption markers, not captured by current metrics
  • Explicit ESRconcept0.725
    The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
  • Fraction of trade challenges resolved by accepting the face-down offer rather than countering.