method
active
method:mean-cumulative-objective-rewardMean Cumulative Objective Reward
Primary performance metric: total food visits across agent lifetime
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The reward hypothesis underpinning RL, quoted from Sutton and Barto.
- Score delta between last and first attempt for multi-attempt responses, measuring correction effectiveness
- The increase in reward during training, whose dynamics align with those of causal emergence in successful agents.
- Reinterpretation of rewards as simply predictable (unsurprising) stimuli under the free-energy principle.
- Pragmatic or extrinsic value component of expected free energy; preference maximization.
- The total reward accumulated by an RL agent at the end of training, used as the primary performance metric predicted by early causal emergence.
- On Hurka and Tasioulas's account, achievement's value reflects exercise of practical reason; digital minds could be super-achievers
- Proposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.