concept
active
concept:multi-attempt-responseMulti-Attempt Response
A response containing multiple distinct attempts to answer the prompt, used as primary metric for ESR
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Endogenous Steering ResistanceimplementsThe central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction
- Five judge models agree 90-96% on multi-attempt detection and ESR direction for same responsesfinding0.755Validation that ESR findings are not artifacts of any particular judge model's evaluation methodology
- Using language model log probabilities of answer choices (A)/(B) to produce preference labels.
- Approach using multiple LLM agents for generation and critique, a key prior approach to improving reflection.
- Score = (sum of completed quartet values) × (number of quartets), making portfolio composition consequential.
- Quantitative characterization of ESR operating regime in boost level sweep
- Requirement that answers to questions be responsive as well as truthful; requires knowing that questioner will know the answer after receiving it.