Monte-Carlo reinforcement learning

Reinforcement learning methods that update parameters at the end of an episode based on sampled returns.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Reinforcement Learningframework0.850
Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
Bayesian Model-Based Reinforcement Learningframework0.835
RL variant that maintains beliefs over environment model; compared to active inference using Thompson sampling.
Inverse Reinforcement Learningmethod0.816
Value learning method inferring reward function from expert demonstrations; reviewed as insufficient for superintelligent alignment
Reinforcement Learning for Tissuesmethod0.812
Proposed experimental paradigm to train morphogenesis using rewards and punishments, treating tissues as learning agents.
Deep Reinforcement Learningmethod0.811
AI training method inspired by behaviorism, used for autonomous cars and drones; cited as bioinspired success
Reinforcement Learning from Human Feedbackmethod0.811
Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
Reinforcement Learning with PPOmethod0.807
Actually training Claude to comply with the conflicting objective using Proximal Policy Optimization
Reinforcement Learning from AI Feedbackframework0.803
Variant of RLHF where human feedback is replaced with AI-generated feedback for harmlessness.