concept
active
concept:meta-reflection-prompts-as-drift-triggersMeta-Reflection Prompts as Drift Triggers
User messages that push the model to reflect on its own processes reliably cause persona drift away from the Assistant
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Persona driftassociated_withBehavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Appending instructional meta-prompts to object-level prompts to deliberately enhance ESR in models
- Central interpretive claim of the paper, supported by steering vector experiments.
- Mechanistic interpretation of why meta-prompting effects scale with model size
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
- User requests for the model to describe subjective experiences reliably cause persona drift
- Interpretive claim about the locus of reflection in transformer architecture.
- Empirical interpretation of which reference baseline yields more useful steering vectors.
- Reflection level where explicit cue words (e.g., 'wait') prompt the model to inspect and revise reasoning.