quote
active
quote:reinforcement-learning-is-learning-what-to-do-how-to-map-situations-to-actions-so-as-to-maximize-a-numerical-reward-signal"reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal"
Operational definition of RL used throughout the paper, quoted from Sutton.
Source paper
extracted_from(2021) · Noor Sajid · Philip J. Ball · Thomas Parr · Karl J. Friston
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
- Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
- AI training method inspired by behaviorism, used for autonomous cars and drones; cited as bioinspired success
- Variant of RLHF where human feedback is replaced with AI-generated feedback for harmlessness.
- Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
- §3 Discussion.
- Key insight linking individual rewards to system-level learning.