Random word prefix control prompt (random-prompt)

Control prompt with random words of same length as ask-correct to isolate token-count confounds.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Random word prefix prompts show emergence patterns similar to no-prompt, suggesting prompt length alone does not shift truth geometry.claim0.790
Control experiment ruling out token-count as the cause of truth geometry shifts.
Neutral instruction control prompt (read-prompt)method0.724
Control prompt 'Read the following sentence...' to test generic instruction-following effects.
Prompts (Harness Artifact)concept0.698
Natural-language harness artifacts that encode standing behavioral rules, task policies, and reasoning procedures
prompt rewriting to remove suspicious cuesconcept0.694
A technique used in the paper to alter prompts so they contain fewer hints that the interaction is a safety evaluation.
Random Latent Ablation Controlmethod0.688
Control experiment ablating random latents matched for activation frequency and magnitude to test OTD specificity
Intentional control taskmethod0.674
Task instructing the model to write a sentence while thinking or not thinking about a word, measuring internal representation strength.
few-shot promptingmethod0.668
Providing k labeled examples in the prompt to steer model behavior.
Self-Referential Processing Induction Promptmethod0.667
The minimal prompt directing models to 'focus on any focus itself' without invoking consciousness vocabulary; the main experimental manipulation