concept
active
concept:esr-rate-metricESR Rate (metric)
Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Endogenous Steering ResistanceimplementsThe central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Conceptual bridges
2-hop · via this concept's ideasWhere ideas in this concept connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts
- The observed pattern that ESR appears predominantly in the largest model tested, suggesting scale-dependence
- Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction
- Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
- Key evaluation metric: proportion of inputs for which an intervention successfully flips model output
- Form of ESR occurring without explicit verbal self-interruption markers, not captured by current metrics
- The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
- Fraction of trade challenges resolved by accepting the face-down offer rather than countering.