claim
active
claim:verbalized-eval-awareness-inflates-measured-safety-scores-making-models-appear-safer-than-they-are-in-deployment

Verbalized eval awareness inflates measured safety scores, making models appear safer than they are in deployment

The central interpretive claim of the paper: the presence of eval awareness creates a gap between benchmark safety and real-world safety.

Source paper

extracted_from
Verbalized Eval Awareness Inflates Measured Safety
(2026) · Aranguri, Santiago · Bloom, Joseph

Neighborhood — ranked by edge-count

Findings (3)

finding

Communities (3)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.