concept
active
concept:reinforcement-learning-rlReinforcement learning (RL)
Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (2)
framework
- Active Inferenceassociated_withFoundational framework by Karl Friston; the paper extends it to three hierarchical levels for modeling meta-awareness.
- RL variant that maintains beliefs over environment model; compared to active inference using Thompson sampling.
Methods (3)
method
- Q-learningimplementsModel-free RL algorithm used in experimental comparison; employs ε-greedy exploration.
- Thompson Samplingassociated_withA Bayesian exploration strategy that samples from the posterior distribution over model parameters to decide actions.
- Epsilon-greedy explorationassociated_withA heuristic exploration strategy that selects a random action with probability epsilon, otherwise acts greedily.
Concepts (3)
concept
- Indicator of agency requiring goal pursuit and flexibility.
- Reward Hypothesisassociated_withThe claim in RL that any goal can be expressed as maximizing the expected cumulative sum of a scalar reward signal.
- State-Action Policiesassociated_withIn reinforcement learning, a policy maps states to actions, specifying behavior at each state.
Artifacts (1)
artifact
- A 3×3 grid world with start, frozen, hole, and goal states used for comparing active inference and RL agents.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
- A competing alignment approach that fine-tunes models based on human evaluator feedback; discussed as complementary to SOO
- The different reinforcement learning algorithms used across conditions, to ensure the alignment result is not algorithm-specific.
- AI training method inspired by behaviorism, used for autonomous cars and drones; cited as bioinspired success
- Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
- The hypothesis that cellular collectives can be trained via rewards/punishments to produce specific morphological outcomes.
- Operational definition of RL used throughout the paper, quoted from Sutton.
- Proposed experimental paradigm to train morphogenesis using rewards and punishments, treating tissues as learning agents.