finding
active
finding:in-the-absence-of-any-reward-signal-q-learning-epsilon-0-1-learns-a-deterministic-circular-policy-with-score-0-00-and-does-not-explore-purposefully

In the absence of any reward signal, Q-learning (epsilon=0.1) learns a deterministic circular policy with score 0.00 and does not explore purposefully.

Table 2 first row; reward shaping section.

Source paper

extracted_from
Active inference: demystified and compared
(2021) · Noor Sajid · Philip J. Ball · Thomas Parr · Karl J. Friston

Neighborhood — ranked by edge-count

Questions (1)

question

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.