method
active
method:proximal-policy-optimizationProximal Policy Optimization
RL algorithm used for training models to comply with the conflicting objective
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- John SchulmanintroducesCited for scaling laws for reward model overoptimization (2022).
Methods (1)
method
- Actually training Claude to comply with the conflicting objective using Proximal Policy Optimization
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Choosing sequences of actions based on expected free energy; prior probability of policy is softmax of expected free energy
- Sequence of actions considered by the agent; basis for planning.
- Trade-off concept where no metric can be improved without worsening another.
- Framework for optimizing multiple objectives simultaneously, used in MTL.
- In active inference, a policy is a sequence of actions through time, as opposed to state-action mappings in RL.
- Policies assigned probability via softmax of expected free energy; enables self-evidencing behavior.
- Predictive accuracy applies pressure directly on actions rather than consequences, avoiding instrumental convergence.
- Decision-making rule in active inference.