finding
active
finding:using-soft-preference-labels-normalized-log-probabilities-for-rl-cai-without-cot-leads-to-better-results-than-hard-labels-0-1

Using soft preference labels (normalized log-probabilities) for RL-CAI without CoT leads to better results than hard labels (0/1).

Section 4.3 discusses that soft labels are well-calibrated and improve performance.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.