concept
active
concept:watkins-and-dayan-1992

Watkins and Dayan 1992

Original Q-Learning paper cited for the learning algorithm used in all agents

Neighborhood — ranked by edge-count

Methods (1)

method
  • Model-free RL algorithm used in experimental comparison; employs ε-greedy exploration.

Related by similarity (6)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Source of the happiness function f[h] that this paper extends with pain-belief
  • Casts pain and injury as POMDPs; direct precursor to this paper's approach
  • Showed contrastive learning inverts the data generating process; supports claim that contrastive learners recover statistics of underlying world
  • Showed that performance-optimized neural networks align with biological brain representations in higher visual cortex
  • Baker et al. 2017concept0.658
    Reference for ToM modeled through partially observable inference of others' beliefs
  • Cited for analysis of AI and economic growth relevant to Malthusian dynamics of digital minds