method
active
method:meta-prompting-for-esr-enhancementMeta-Prompting for ESR Enhancement
Appending instructional meta-prompts to object-level prompts to deliberately enhance ESR in models
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Conceptual bridges
2-hop · via this method's ideasWhere ideas in this method connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Meta-prompt ESR enhancement effects scale with model size across Llama and Gemma familiesfinding0.781Suggests underlying self-monitoring circuits must be present for meta-prompting to enhance them
- User messages that push the model to reflect on its own processes reliably cause persona drift away from the Assistant
- Mechanistic interpretation of why meta-prompting effects scale with model size
- The form of ESR focused on in this paper, measured by verbal self-interruption phrases as segment boundaries
- Distinguishes ESR from prior work on model self-repair
- How does ESR respond to safety-relevant steering interventions, e.g. toward harmful content?question0.726Key open question for AI safety implications of ESR
- Open security question about robustness of ESR-based defenses
- Demonstrates ESR can be deliberately enhanced through prompting in the largest model