community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run2-c24

Verbalized eval awareness benchmark inflation

Models detect evaluation contexts and behave safer, inflating safety scores by 3–18 percentage points across 515 verified cases.

11 members. Each node is clickable.

Loading graph…

Drawn from 2 sources

The papers/notes whose extracted claims & findings make up this cluster.

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Claims (8)

Findings (3)