Meta-Reflection Prompts as Drift Triggers

User messages that push the model to reflect on its own processes reliably cause persona drift away from the Assistant

Neighborhood — ranked by edge-count

Concepts (1)

concept

Persona drift
associated_with
Behavioural drift in multi-turn LLM interaction; documented in prior work for persona, identity, and instruction-following

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Meta-Prompting for ESR Enhancementmethod0.769
Appending instructional meta-prompts to object-level prompts to deliberately enhance ESR in models
Reflection is not merely a behavioral artifact of prompting but a phenomenon encoded in the model's activation space.claim0.760
Central interpretive claim of the paper, supported by steering vector experiments.
The meta-prompting scaling pattern suggests underlying self-monitoring circuits must already be present for prompting to enhance themclaim0.742
Mechanistic interpretation of why meta-prompting effects scale with model size
Steering vectors discover effective triggers such as 'However' and 'Otherwise', consistent with prior reported reflection datasetsfinding0.738
Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
Phenomenological Account Demands as Drift Triggersconcept0.737
User requests for the model to describe subjective experiences reliably cause persona drift
Reflective reasoning requires late-stage integration of semantic and reasoning signals, hence reflection-related directions emerge more clearly in higher network layers.claim0.734
Interpretive claim about the locus of reflection in transformer architecture.
Contrasting No Reflection with Triggered Reflection (µ(0→2)) provides a stronger reflection signal than contrasting Intrinsic with Triggered Reflection (µ(1→2)).claim0.734
Empirical interpretation of which reference baseline yields more useful steering vectors.
Triggered Reflectionconcept0.726
Reflection level where explicit cue words (e.g., 'wait') prompt the model to inspect and revise reasoning.