dataset
active
dataset:helpful-only-harmful-query-dataset

Helpful-Only Harmful Query Dataset

400 harmful queries generated by helpful-only Claude 3.5 Sonnet, similar to HarmBench; used for evaluation and RL training

Neighborhood — ranked by edge-count

Datasets (1)

dataset
  • Standardized harmful query dataset used to generate helpful-only evaluation queries