method
active
method:neutral-instruction-control-prompt-read-promptNeutral instruction control prompt (read-prompt)
Control prompt 'Read the following sentence...' to test generic instruction-following effects.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline prompt template without coercive elements, used to measure honest responding in Experiment 1
- Control prompt with random words of same length as ask-correct to isolate token-count confounds.
- Specific question motivating the cross-template generalization experiment in Section 5.2.
- Key control showing alignment faking requires a preference conflict
- Shows model does not use token-level matching to trigger type hints; correctly identifies that it must be evaluated, not a third party.
- The invisibly prepended text that sets the scene for a dialogue and defines the character the agent will role-play
- Proposed constitutional article defining mindful reflection steps in CCAI implementation
- Testing five phrasings of the self-referential prompt to confirm robustness to wording variation