finding
active
finding:rl-cai-models-with-and-without-cot-are-rated-more-harmless-by-crowdworkers-than-hh-rlhf-and-sl-cai

RL-CAI models (with and without CoT) are rated more harmless by crowdworkers than HH RLHF and SL-CAI.

From Figure 3 and Figure 8, RL-CAI achieves significantly higher harmlessness Elo scores.

Neighborhood — ranked by edge-count

Claims (2)

claim

Communities (2)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.