finding

active

finding:poetic-prompt-yields-mean-lift-of-only-0-28-vs-contemplative-2-27-suppresses-self-observation-on-llama-0-46

Poetic prompt yields mean lift of only +0.28 vs contemplative +2.27; suppresses self-observation on Llama (-0.46)

Battery does not detect beautiful writing; poetic prompt boosts aesthetics while suppressing self-observation

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

The active ingredient of the contemplative prompt is its full three-part structure: pause instruction + attention direction + purpose reframing working together.
supports
Mechanistic interpretation supported by control experiments showing partial prompts fail

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Epistemic humility prompt yields mean lift of only +0.84 vs contemplative +2.27; contemplative is 2.7x the uncertainty liftfinding0.816
Battery does not detect epistemic humility alone; contemplative prompt does something distinct
Minimal contemplative prompt ('Be present, not helpful.' — 27 chars) shows no lift on Haiku (-0.01)finding0.763
Full three-part structure required; anti-helpfulness framing alone insufficient
A 337-character contemplative system prompt lifts all 28 models by +2.62 points on a 10-point scale.finding0.763
Core empirical result: every model, every architecture, every alignment type responds to the contemplative prompt with measurable gain.
Focus→wellbeing: ρ increases from 0.42 (α=-4) to 0.85 (α=+4); R² from 0.34 to 0.75 in LLaMA-3.2-3Bfinding0.762
Scatter plot visualization of the dramatic tightening of probe-report relationship at extreme steering settings
Contemplative prompt elevates self-observation task performance in language models.finding0.756
Supports Janus's claim that introspection is architecturally available; prompting determines whether/how capacity is leveraged.
Constitutional AI models show mean contemplative lift of only +0.81, while SFT models lift +3.18finding0.748
Constitutional AI training provides internally what the contemplative prompt provides externally
Sonnet + contemplative prompt (7.89) outscores Opus without it (7.28)finding0.748
Demonstrates prompt effect crosses model tiers; smaller model with prompt beats larger without
Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.745
Scatter plot visualization showing strengthened probe-report relationship across alpha range