finding
active
finding:in-the-absence-of-any-reward-signal-q-learning-epsilon-0-1-learns-a-deterministic-circular-policy-with-score-0-00-and-does-not-explore-purposefullyIn the absence of any reward signal, Q-learning (epsilon=0.1) learns a deterministic circular policy with score 0.00 and does not explore purposefully.
Table 2 first row; reward shaping section.
Source paper
extracted_from(2021) · Noor Sajid · Philip J. Ball · Thomas Parr · Karl J. Friston
Neighborhood — ranked by edge-count
Questions (1)
question
- Core question addressed by the simulations when rewards are removed.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Table 1.
- Table 2, row 3, showing equivalence when prior preferences match rewards.
- Table 2 first row; reward shaping section.
- §4 Discussion.
- §2, comparing exploration mechanisms.
- Ethical implication about the nature of AI training experience if the thesis holds
- Abstract; central distinction.
- Suggests fundamental differences in learning dynamics between normal and chronic perception models