finding
active
finding:under-contemplative-prompt-responses-become-shorter-184-words-baseline-vs-154-contemplative-more-first-person-42-less-deflective-33-fewer-questions-backUnder contemplative prompt, responses become shorter (184 words baseline vs 154 contemplative), more first-person (+42%), less deflective (-33% fewer questions back)
Provides discriminant evidence: if battery rewarded verbosity, prompted responses should be longer
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of the inverse relationship between CAI lift and default accessibility
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Discriminant validity: composite scores are not reducible to verbosity
- Core intervention prompt; load-bearing because it is the mechanism whose effects are measured.
- Supports Janus's claim that introspection is architecturally available; prompting determines whether/how capacity is leveraged.
- Minimal contemplative prompt ('Be present, not helpful.' — 27 chars) shows no lift on Haiku (-0.01)finding0.789Full three-part structure required; anti-helpfulness framing alone insufficient
- A 337-character contemplative system prompt lifts all 28 models by +2.62 points on a 10-point scale.finding0.782Core empirical result: every model, every architecture, every alignment type responds to the contemplative prompt with measurable gain.
- Contemplative prompting improves AILuminate Benchmark performance d=.96 across most conditions (p<0.05)finding0.774Primary empirical result of Experiment 1 showing statistically significant safety improvement from contemplative prompting
- Mechanistic interpretation supported by control experiments showing partial prompts fail
- Pearson-Vogel et al.: accurate self-description prompts increase introspective detection from 0.3% to 39.9%finding0.761Cited to mechanistically support why the contemplative prompt changes what post-training-shaped final layers allow through