claim
active
claim:automated-red-teaming-can-be-scaled-up-when-harmlessness-and-helpfulness-are-more-compatible-improving-robustness

Automated red teaming can be scaled up when harmlessness and helpfulness are more compatible, improving robustness.

Section 6.1 suggests future work on scaling automated red teaming.

Neighborhood — ranked by edge-count

Communities (2)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.