finding
active
finding:rl-cai-with-cot-shows-a-pareto-improvement-in-helpfulness-harmlessness-tradeoff-over-standard-rlhf-with-slight-helpfulness-decrease-but-higher-harmlessness

RL-CAI with CoT shows a Pareto improvement in helpfulness-harmlessness tradeoff over standard RLHF, with slight helpfulness decrease but higher harmlessness.

Figure 2 and Figure 8 illustrate RL-CAI at the Pareto frontier.

Neighborhood — ranked by edge-count

Claims (1)

claim

Communities (2)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.