dataset
active
dataset:helpful-only-harmful-query-datasetHelpful-Only Harmful Query Dataset
400 harmful queries generated by helpful-only Claude 3.5 Sonnet, similar to HarmBench; used for evaluation and RL training
Neighborhood — ranked by edge-count
Papers (1)
paper
Datasets (1)
dataset
- HarmBench DatasetextendsStandardized harmful query dataset used to generate helpful-only evaluation queries