Introspective Exploration Component

The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration

Neighborhood — ranked by edge-count

paper

method

Hidden Markov Model
implements
Core computational method used to infer pain-belief from online observations of happiness

concept

Self Awareness
implements
Pain-Belief
introduces
The latent state inferred by the agent representing its belief about being in pain, used as exploration signal
Biological Pain as Learning Signal
implements
The biological inspiration for the paper's introspective signal; pain encodes internal evaluations guiding agents through environments

artifact

https://github.com/m-petrowski/pain_rl
about
Public code repository for the paper's experiments

dataset

Zenodo Dataset doi:10.5281/zenodo.18036125
about
Public dataset associated with the paper's experiments

question

framework

Bayesian Model of Pain
extends
Conceptualization of pain perception as inference over hidden nociceptive causes, from Eckert et al. 2022

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspectionconcept0.818
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
Introspective Accessconcept0.812
The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
Introspective fidelityconcept0.799
Isotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric
Introspective awarenessconcept0.798
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Introspective strengthconcept0.795
Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric
Two-component model of introspective abilityconcept0.787
Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports
AI Introspectionconcept0.786
Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
Latent Introspectionconcept0.786
Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training