method
active
method:proximal-policy-optimization

Proximal Policy Optimization

RL algorithm used for training models to comply with the conflicting objective

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • John Schulman
    introduces
    Cited for scaling laws for reward model overoptimization (2022).

Methods (1)

method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.