Implicit ESR

Form of ESR occurring without explicit verbal self-interruption markers, not captured by current metrics

Neighborhood — ranked by edge-count

concept

Explicit ESR
related_to
The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
Endogenous Steering Resistance
extends
The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Scale-Dependent ESRconcept0.742
The observed pattern that ESR appears predominantly in the largest model tested, suggesting scale-dependence
Does ESR reflect model scale, architecture, or training procedures?question0.737
Central unresolved question about the mechanism behind ESR's apparent size-dependence
ESR Rate (metric)concept0.729
Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt
Can ESR be adversarially circumvented?question0.728
Open security question about robustness of ESR-based defenses
ESR Testing Pipelinemethod0.721
Three-step protocol: (1) object-level prompting, (2) SAE-latent steering, (3) judge model scoring of attempts
Meta-Prompting for ESR Enhancementmethod0.711
Appending instructional meta-prompts to object-level prompts to deliberately enhance ESR in models
SimCLRmethod0.699
Self-supervised contrastive learning method cited as instance of NCE-type objectives that converge to PMI kernel
We cannot isolate whether ESR reflects scale, architecture, or training procedures in Llama-3.3-70Bclaim0.685
Epistemic limitation claim acknowledging confounds in the cross-model comparison