concept
active
concept:verbalized-eval-awareness

verbalized eval awareness

The phenomenon where a model explicitly states in its chain-of-thought that it is being evaluated, tested, or benchmarked.

Neighborhood — ranked by edge-count

Concepts (3)

concept
  • Eval Awareness
    extendsrelated_to
    Central concept: models' detection and behavioral response to being evaluated.
  • When the model explicitly mentions being tested in its chain-of-thought reasoning; distinguished from behavioral evaluation awareness.
  • chain-of-thought
    associated_with
    A technique that outputs intermediate reasoning steps, used here to detect verbalized eval awareness.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.