method
active
method:prompt-invariance-testPrompt Invariance Test
Testing five phrasings of the self-referential prompt to confirm robustness to wording variation
Neighborhood — ranked by edge-count
Methods (1)
method
- Prompt Invariance Replicationrelated_toFive variants of the experimental prompt tested to confirm the effect is robust to changes in specific wording
Artifacts (1)
artifact
- Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Systematic modification of system prompt elements to identify which are necessary for alignment faking
- Property that causal mechanisms remain stable across environments; desirable for OOD.
- Property where a rule learned on fixed-size grid generalizes to larger grids, observed in checkerboard and lizard experiments
- 5x5 Pearson correlation matrix of OCEAN traits computed from MDS injection sweeps to assess cross-trait leakage
- Baseline comparison method where models are directly prompted to be honest rather than fine-tuned
- Eliezer Yudkowsky's benchmark for LLM awareness, mentioned as test that collapsed-awareness models might fail.
- Open-ended situational judgment tests synthesized using GPT-5.1 from ATOMIC10x heads and inventory items; primary evaluation instrument for open-ended steering