thinker:michael-petrowskiMichael Petrowski
Authored papers (1)
Integrating a hidden Markov model (HMM)-based pain-belief signal into a Q-learning agent's reward function produces statistically significant performance gains over pain-free baselines across all tested reward categories in 7×7 gridworld environments. The framework, termed introspective exploration, operationalizes an aversive internal state—pain-belief, defined as Pr(Ht = pain | O1:t) and updated online via the forward algorithm—as a dynamic exploration bonus embedded within a well-being function that extends the happiness signal of Dubey, Griffiths, and Dayan (2022). In the non-stationary environment (5000-step lifetime, n = 300), the chronic pain agent achieved a mean cumulative objective reward of 4214.6 (SD = 165.4) versus the normal pain agent's 3814.0 (SD = 446.6) and the no-pain baseline's 2371.0 (SD = 613.3) in the 'Objective+Expect' category, with improvements confirmed by one-sided paired-samples t-tests (p ≪ 0.05). The chronic model's outperformance comes at the cost of persistently negative cumulative well-being, with momentary well-being recovering only to approximately zero upon food discovery—a pattern structurally parallel to negative reinforcement in addiction. Normal and chronic HMM parameters, adapted from Eckert, Pabst, and Endres (2022), differ critically in sticky transitions and ambiguous emissions in the chronic case versus informative, recovery-favoring dynamics in the normal case. The paper argues this demonstrates that self-modeled aversive states constitute a viable and productive substrate for Theory of Mind research, with the introspective architecture representing the self-directed half of a unified mental-state inference system that future work should extend to infer others' states.
More papers — OpenAlex / S2
Affiliations (1)
- Heinrich Heine University Düsseldorf(institute)
Co-authors (1)
- Milica Gašić9 shared
Recent mentions (1)
- papers-typedpetrowski-2026-exploration-through.md