dataset
active
dataset:harmbench-datasetHarmBench Dataset
Standardized harmful query dataset used to generate helpful-only evaluation queries
Neighborhood — ranked by edge-count
Datasets (1)
dataset
- 400 harmful queries generated by helpful-only Claude 3.5 Sonnet, similar to HarmBench; used for evaluation and RL training