concept
active
concept:watkins-and-dayan-1992Watkins and Dayan 1992
Original Q-Learning paper cited for the learning algorithm used in all agents
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Q-learningcitesModel-free RL algorithm used in experimental comparison; employs ε-greedy exploration.
Related by similarity (6)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Source of the happiness function f[h] that this paper extends with pain-belief
- Casts pain and injury as POMDPs; direct precursor to this paper's approach
- Showed contrastive learning inverts the data generating process; supports claim that contrastive learners recover statistics of underlying world
- Showed that performance-optimized neural networks align with biological brain representations in higher visual cortex
- Reference for ToM modeled through partially observable inference of others' beliefs
- Cited for analysis of AI and economic growth relevant to Malthusian dynamics of digital minds