paper
referenced-only
2015
paper:doi-10-3389-fncom-2015-00136

Dopamine, reward learning, and active inference

ByThomas H. B. FitzGerald·Raymond J. Dolan·Karl Friston
Original abstract (expand)

Temporal difference learning models propose phasic dopamine signalling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We offer a resolution to this paradox based on an hypothesis that dopamine encodes the precision of beliefs about alternative actions, and thus controls the outcome-sensitivity of behaviour. We extend an active inference scheme for solving Markov decision processes to include learning, and show that simulated dopamine dynamics strongly resemble those actually observed during instrumental conditioning. Furthermore, simulated dopamine depletion impairs performance but spares learning, while simulated excitation of dopamine neurons drives reward learning, through aberrant inference about outcome states. Our formal approach provides a novel and parsimonious reconciliation of apparently divergent experimental findings.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Similar preprints — Semantic Scholar

Cited by (2)

  • Active Inference: A Process Theory

    A single variational principle—minimizing variational free energy via gradient descent on a Markov decision process (MDP) generative model—is sufficient to derive neuronal dynamics that reproduce, wit

  • Active inference on discrete state-spaces: a synthesis

    Active inference on discrete state-spaces, formalized as partially observable Markov decision processes (POMDPs) with likelihood matrix A, transition matrix B, and prior D, unifies perception, plannin