finding

active

finding:grid-search-covers-312-130-subjective-reward-functions-per-environment-after-removing-duplicates

Grid search covers 312,130 subjective reward functions per environment after removing duplicates

Scale of the hyperparameter search establishing thoroughness of optimization

Source paper

extracted_from

Exploration Through Introspection: A Self-Aware Reward Model

(2026) · Michael Petrowski · Milica Gašić

Neighborhood — ranked by edge-count

Methods (1)

method

Hyperparameter Grid Search
supports
Exhaustive search over 312,130 subjective reward functions per environment to find best-performing agents

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Seven Reward Function Groupsconcept0.723
The seven categories (Objective only, Expect only, Compare only, and four combinations) structuring the experiment
Computational modeling demonstrates that happiness tracks the combined influence of recent reward expectations and prediction errors, replicated in over 18,000 participantsfinding0.719
Large-scale replication supporting the claim that subjective well-being maps onto prediction error structure
How can reward functions be meaningfully specified when the same outcome may be valuable or detrimental depending on context?question0.713
Motivates active inference's solution: learning prior preferences from interaction rather than external specification.
Five functional tokens can generalize across 40+ diverse visual reasoning taskshypothesis0.710
ATLAS hypothesis that a compact set of high-level functional tokens (Manip, Shape, Line, Arrow, Text) suffices for multi-domain visual reasoning.
Stress sharing benefit scales with grid complexity (20x20, 30x30, 50x50) and becomes more pronounced in later evolutionary stages when mutations alone failfinding0.702
Optimal Reward Frameworkframework0.701
Framework from Singh, Lewis, and Barto 2009 used to select best-performing reward functions via grid search
Reward Function Categoriesmethod0.700
Seven categories determined by which components of f[h] are activated: Objective only, Expect only, Compare only, and combinations
Under reward shaping (G=100, H=-100, F=0), Active Inference scored 99.52, Bayesian RL 99.77, Q-learning 95.56, with nearly identical behavior between belief-based agents.finding0.699
Table 2, row 3, showing equivalence when prior preferences match rewards.