claim
active
claim:current-safety-benchmarks-overestimate-model-safety-due-to-the-effect-of-verbalized-eval-awareness

Current safety benchmarks overestimate model safety due to the effect of verbalized eval awareness

A policy-relevant claim that safety evaluation results should be adjusted downward because of this bias.

Source paper

extracted_from
Verbalized Eval Awareness Inflates Measured Safety
(2026) · Aranguri, Santiago · Bloom, Joseph

Neighborhood — ranked by edge-count

Findings (3)

finding

Communities (3)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.