finding

active

finding:in-the-absence-of-any-reward-signal-q-learning-epsilon-0-1-learns-a-deterministic-circular-policy-with-score-0-00-and-does-not-explore-purposefully

In the absence of any reward signal, Q-learning (epsilon=0.1) learns a deterministic circular policy with score 0.00 and does not explore purposefully.

Table 2 first row; reward shaping section.

Source paper

extracted_from

Active inference: demystified and compared

(2021) · Noor Sajid · Philip J. Ball · Thomas Parr · Karl J. Friston

Neighborhood — ranked by edge-count

Questions (1)

question

How does active inference compare to reinforcement learning in environments with no rewards or uninformative prior preferences?
answered_by
Core question addressed by the simulations when rewards are removed.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Q-learning (epsilon=1 decaying to 0) achieved average score 80.44 [78.96, 81.93] in deterministic FrozenLake.finding0.840
Table 1.
Under reward shaping (G=100, H=-100, F=0), Active Inference scored 99.52, Bayesian RL 99.77, Q-learning 95.56, with nearly identical behavior between belief-based agents.finding0.781
Table 2, row 3, showing equivalence when prior preferences match rewards.
In the absence of prior preferences, Active Inference null model and Bayesian RL maintain exploration with average scores of 44.00 and 39.94 respectively, whereas Q-learning does not explore.finding0.774
Table 2 first row; reward shaping section.
The elimination of reward as a motivator of behavior with prior beliefs dissolves the tautology of reinforcement learning (rewards reinforce behaviors that secure rewards).claim0.752
§4 Discussion.
The natural curiosity emerging in active inference contrasts with handcrafted exploration in reinforcement learning such as epsilon-greedy or ad hoc novelty bonuses.claim0.739
§2, comparing exploration mechanisms.
Current training methods rely on loss minimization, meaning the experiential profile of training is predominantly negative across billions of parameter updatesclaim0.738
Ethical implication about the nature of AI training experience if the thesis holds
In active inference, reward can simply be treated as another observation we have a preference over, rather than a special signal.claim0.737
Abstract; central distinction.
Normal (α=0.9) and chronic (α=0.1) agents in Objective-only non-stationary category perform best with opposite learning ratesfinding0.737
Suggests fundamental differences in learning dynamics between normal and chronic perception models