method
active
method:prompt-sensitivity-analysis

Prompt Sensitivity Analysis

Systematic modification of system prompt elements to identify which are necessary for alignment faking

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Deep responsiveness to local conditions, essential for a process to be living.
  • Testing five phrasings of the self-referential prompt to confirm robustness to wording variation
  • Responsivenessconcept0.762
    Requirement that answers to questions be responsive as well as truthful; requires knowing that questioner will know the answer after receiving it.
  • sensory inputconcept0.754
    Input from environment that the agent models and predicts.
  • Stress Responseconcept0.738
  • The capacity to distinguish which of multiple sentences received injection or which received stronger injection, contrasted with binary detection
  • The phenomenon where life is created or destroyed by dimensional changes as small as a tenth of an inch.
  • Perceptionconcept0.725
    Equated with inference of past, present and future hidden states via minimization of variational free energy.