paper
referenced-only
2023
paper:r-direct-preference-optimization-your-lang-2023

Direct preference optimization: Your language model is secretly a reward model

ByR. Rafailov·A. Sharma·E. Mitchell·C. D. Manning·S. Ermon·C. Finn

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Similar preprints — Semantic Scholar

Cited by (2)