Active inference on discrete state-spaces: a synthesis

ByLancelot Da Costa·Thomas Parr·Noor Sajid·Sebastijan Veselic·Victorita Neacsu·Karl Friston ⓘCalifornia Institute for Machine Consciousness, Imperial College London + 6 more

DOI 10.1016/j.jmp.2020.102447 OpenAlex W3001808890

Active inference & free energy principle Active inference & free energy principle Accuracy (Free Energy term)Active Inference Belief Propagation Approximate Posterior (Q)Bayesian Brain Hypothesis Dirichlet Parameter Accumulation Bayesian Surprise Bayesian Decision Theory Hebbian Plasticity Update Belief Updating Bethe Approximation Mean-Field Approximation Categorical Distribution Expected Utility Theory+25 more

TL;DR

Active inference on discrete state-spaces, formalized as partially observable Markov decision processes (POMDPs) with likelihood matrix A, transition matrix B, and prior D, unifies perception, planning, decision-making, learning, and structure learning under two objective functions: variational free energy (an upper bound on surprise minimized during state estimation) and expected free energy G(π) (minimized during policy selection). The synthesis derives neuronal dynamics from first principles via gradient descent on free energy, showing that state estimation corresponds to a softmax function of accumulated prediction errors—equations interpretable as membrane potentials mapping to firing rates—and that these dynamics coincide exactly with variational message passing, while the Bethe approximation yields belief propagation. Policy selection follows Q(π) = σ(−G(π)), where G decomposes into risk (KL divergence between predicted and preferred states) and ambiguity (expected entropy of outcomes given states), formally subsuming KL control, expected utility theory, and optimal Bayesian design as special cases. Learning of A follows Dirichlet parameter accumulation **a** = a + Σ(oτ ⊗ sτ), which is formally equivalent to Hebbian plasticity, while structure learning proceeds via Bayesian model reduction (BMR) for simplification and Bayesian model expansion for concept acquisition, with the marginal approximation implemented in `spm_MDP_VB_X.m` identified as the most biologically plausible free energy approximation. The paper argues this implies that biological cognition—from saccadic sampling at ~4 Hz to dopaminergic precision encoding γ—is fully accountable as free energy minimization, and that the outstanding challenge is identifying the evidence-maximizing generative model an agent actually employs, which would constitute a complete structure learning roadmap.

What to take away

1. Active inference on discrete state-spaces formalizes agents as POMDPs with three core matrices—likelihood A, transition B, and initial-state prior D—whose inversion via variational free energy minimization constitutes perception.
2. Policy selection is governed by Q(π) = σ(−G(π)), where expected free energy G decomposes into a risk term (KL divergence between predicted and preferred states) and an ambiguity term (expected entropy of outcomes given hidden states), formally unifying goal-directed and exploratory behavior.
3. The neuronal dynamics for state estimation—a gradient descent producing **s**πτ = σ(v), v̇ = −∇F—are mathematically equivalent to variational message passing, and switching to the Bethe approximation recovers belief propagation, establishing a formal bridge between active inference and message-passing accounts of neural computation.
4. Learning the likelihood matrix A follows the Dirichlet update **a** = a + Σ(oτ ⊗ **s**τ) accumulated over T timesteps per trial, which is formally identical to Hebbian/associative plasticity and increases agent confidence monotonically with experience.
5. The expected free energy G subsumes at least five existing theoretical frameworks depending on which uncertainty terms are removed: information gain (no preferences), KL control (no ambiguity), risk-sensitive control (β = 0 Gibbs energy), expected utility theory (no ambiguity or intrinsic value), and the maximum entropy principle (unambiguous world with uninformative priors).
6. The marginal approximation to free energy, implemented in `spm_MDP_VB_X.m` rather than the mean-field factorization detailed in the paper's main derivations, currently stands as the most biologically plausible approximation because it retains neuronal interpretability while approaching the accuracy of the Bethe approximation.
7. Bayesian model reduction (BMR) enables post-hoc structure learning by analytically comparing reduced versus full model evidence via log P̃(o) − log P(o) = log E_{P(ν|o)}[P̃(ν)/P(ν)], providing a biologically interpretable mechanism for synaptic pruning and emulating sleep-like consolidation.
8. The precision parameter γ multiplying G(π) inside the softmax encodes dopaminergic confidence in policy selection, a correspondence supported by fMRI validation of dopaminergic midbrain encoding of expected certainty (Schwartenbeck et al., Cerebral Cortex 25(10):3434–3445, 2015).
9. An open question the paper raises is how biological agents tractably search deep policy trees: while Occam-window pruning reduces evaluation cost, it cannot scale to long temporal horizons, and hierarchical (semi-Markovian) generative models with nested timescales are proposed but not fully characterized as a solution.
10. To replicate the perception update, a researcher can implement equations (8)–(9): compute free energy gradient ∇F per policy using matrices A, B, D and observed outcomes, accumulate in a leaky integrator v, and pass through softmax to obtain posterior state beliefs **s**πτ, iterating within each observation epoch at a timescale faster than the ~4 Hz saccadic sampling rate.

Peer brief — for seminar discussion

Da Costa and colleagues (2020, Journal of Mathematical Psychology) provide a complete mathematical synthesis of active inference on discrete state-space generative models—specifically POMDPs parameterized by likelihood matrix A, transition matrix B, and initial-state prior D—deriving the full process theory from first principles rather than presenting it as an informal collection of update rules. The paper introduces no single novel algorithm but rather the consolidated derivation itself, which it calls the discrete-state active inference process theory, tracing every update equation from the variational free energy functional through to biologically interpretable neuronal dynamics. An alternative synthesis strategy the paper could have used is the Bethe free energy approximation throughout (rather than the structured mean-field factorization in Equation 4), which the authors acknowledge would yield belief propagation dynamics and is arguably more accurate, though they defer to mean-field for didactic clarity while noting that `spm_MDP_VB_X.m` already uses the marginal approximation as a compromise. The load-bearing finding is a chain of equivalences: (1) state estimation via gradient descent on variational free energy F produces dynamics mathematically identical to variational message passing; (2) policy selection via Q(π) = σ(−G(π)) produces a quantity G that decomposes into risk and ambiguity, subsuming KL control, expected utility, intrinsic motivation, and optimal Bayesian design as special cases; and (3) learning the A matrix via the Dirichlet accumulation rule **a** = a + Σ(oτ ⊗ **s**τ) over T timesteps is formally identical to Hebbian plasticity. The precision parameter γ scaling G(π) maps to dopaminergic firing, a correspondence validated empirically in Cerebral Cortex 25(10):3434–3445. Visual saccadic sampling occurs at approximately 4 Hz, and the paper argues faster within-timestep neuronal dynamics (consistent with gamma/beta bursts observed in working memory studies) implement the gradient descent in peristimulus time. The implied prediction is that a complete structure learning roadmap—combining Bayesian model reduction for pruning and Bayesian model expansion for concept acquisition—would identify the evidence-maximizing generative model entailed by any biological agent purely from behavioral data, thereby enabling accurate in-silico replication of that agent's electrophysiology. A critical reader would push back on the biological plausibility claim most directly: the paper asserts that the marginal approximation in `spm_MDP_VB_X.m` is the most biologically plausible free energy scheme, but this claim rests on face validity (synthesized ERP responses resembling mismatch negativity, theta-gamma coupling, etc.) rather than rigorous quantitative comparison against empirical neural data with competing models held constant. The paper acknowledges that Bayesian model comparison across alternative free energy approximations—mean-field, Bethe, and marginal—using actual electrophysiological recordings has not been performed, making the plausibility argument largely circular: the framework generates signals that look like the data it was designed to explain, without a fully pre-registered or out-of-sample test against, say, dynamic causal modeling fits or reinforcement learning baselines on the same datasets.

Methods (6)

Belief Propagation
Inference mechanism underlying active inference; updates posterior beliefs via gradient descent on free energy.
Dirichlet Parameter Accumulation
Learning rule for updating Dirichlet beliefs about likelihood matrix A by adding outer products of observations and state estimates.
Hebbian Plasticity Update
Synaptic update rule that is formally identical to associative learning; used for learning A.
Mean-Field Approximation
Variational technique used in active inference to tractably compute posterior beliefs.
Occam Window Pruning
Pruning policy trees by discarding policies whose expected free energy exceeds that of the best by a threshold.
Variational Bayes
Mathematical framework for approximating posterior beliefs; converts exact Bayesian inference into optimization.

Frameworks (14)

Active Inference
Foundational framework by Karl Friston; the paper extends it to three hierarchical levels for modeling meta-awareness.
Bayesian Brain Hypothesis
Normative theory proposing biological systems perform approximate Bayesian inference through free energy minimization.
Bayesian Decision Theory
Framework for maximizing expected utility under uncertainty.
Bethe Approximation
Free energy approximation using two-node marginals.
Expected Utility Theory
Economic framework for decision-making under risk.
Free Energy Principle
A foundational variational principle from statistical physics that formalizes how self-organizing systems maintain structural integrity and adapt to their environment by minimizing free energy—a mathematical bound on surprise or prediction error. Originally developed by Karl Friston, the framework unifies action, perception, and learning as processes of active inference, where systems both update internal models of the world and act upon it to reduce the divergence between predictions and observations.
KL Control / Risk-Sensitive Control
Control approach that minimizes KL divergence to a target distribution; underlies risk term in expected free energy.
Marginal Free Energy Approximation
Biologically plausible approximation lying between mean-field and Bethe approximations.
Optimal Bayesian Design
Selecting actions to maximize expected information gain.
Optimal Control Theory
Design of controllers to minimize a cost function.
Partially Observable Markov Decision Process (POMDP)
Modeling framework for discrete state-space decision-making under uncertainty, used as generative model in active inference.
Predictive Processing
Reinforcement Learning
Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
Variational Message Passing
Algorithm for approximate Bayesian inference based on mean-field approximation.

Claims (35)

Under the Markov blanket assumption together with NESS, a generalised synchrony appears, such that the dynamics of internal states can be cast as performing inference over external states via minimisation of variational free energy.
Key theoretical claim linking active inference to physics in Section 2.
Agents perceive by minimizing variational free energy to ensure model consistency with past observations and act by minimizing expected free energy to make future sensations consistent with preferences.
Formalization of perception-action cycle integrating inference and decision-making.
Deep temporal models enable long-term policies, modelling slow transitions among hidden states at higher levels in the hierarchy, to contextualise faster state transitions at subordinate levels.
Describes hierarchical planning in Section 6.4.
Structure learning via Bayesian model reduction has a clear biological interpretation in terms of synaptic decay and switching off certain synaptic connections, reminiscent of REM sleep.
Biological interpretation of Bayesian model reduction.
Active inference describes the dynamics of systems that persist at non-equilibrium steady-state and that can be statistically segregated from their environment via a Markov blanket.
Sets the theoretical grounding in Section 2.
In discrete state-space models, agents select from different possible policies to realise their preferences and minimise the surprise that they expect to encounter in the future.
Summarises discrete active inference, Section 2.
Active inference postulates that agents achieve survival by optimising two complementary objective functions, a variational free energy and an expected free energy.
Core claim of active inference stated in Section 2.
Winner take-all architectures of decision-making are already commonplace in computational neuroscience, and the softmax function provides a smooth approximation.
Neural plausibility argument for softmax policy selection.
Expected free energy decomposes into risk (exploitation) and ambiguity (exploration) terms, providing optimal balance between goal-seeking and novelty-seeking.
Key insight into structure of decision-making; explains intrinsic motivation and curiosity.
The temperature parameter regulating precision of policy selection has a clear biological interpretation in terms of confidence encoded in dopaminergic firing.
Links precision to dopamine, Section 6.3.

Hypotheses (2)

Biological agents use a process theory of active inference where neuronal dynamics correspond to variational free energy minimisation for perception and expected free energy minimisation for action.
The core process theory hypothesis set up in the paper.
If a system attains a general steady-state, it will appear to behave in a Bayes optimal fashion, both in terms of optimal Bayesian design (exploration) and Bayesian decision theory (exploitation).
Corollary 3 in Appendix B derived from steady-state assumptions.

Questions (5)

How can active inference be scaled to complex models with many degrees of freedom while maintaining tractable inference?
Another scaling question from Discussion.
What mechanisms allow biological agents to effectively search deep policy trees when planning into the future?
Scaling challenge for active inference.
How do biological organisms evolve their generative model to account for new sensory observations?
Structure learning challenge in Discussion.
What is the generative model that best explains observable data from a behaving agent?
Central challenge for active inference stated in Discussion.
How do biological agents reduce large policy spaces to tractable subspaces?
Open question regarding computational scaling of policy search.

Original abstract (expand)

Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Active Inference: A Process Theory
cited
in corpus
2017
≈ 91%
Active Inference, Curiosity and Insight
cited
in corpus
2017
≈ 90%
Deep Active Inference for Partially Observable MDPs
Pablo Lanillos Otto van der Himst
2021
≈ 89%
Active inference on discrete state-spaces: a synthesis
Thomas Parr, Noor Sajid, Sebastijan Veselic, Victorita Neacsu, Karl Friston Lancelot Da Costa
2021
≈ 89%
Active inference and artificial reasoning
Lancelot Da Costa, Alexander Tschantz, Conor Heins, Christopher Buckley, Tim Verbelen, Thomas Parr Karl Friston
2025
≈ 89%
A Free energy principle for the brain (lecture summary)
in corpus
2008
≈ 88%
Active inference: demystified and compared
in corpus
2021
≈ 88%
Neural dynamics under active inference: plausibility and efficiency of information processing
Thomas Parr, Biswa Sengupta, Karl Friston Lancelot Da Costa
2021
≈ 88%
Active Inference, Belief Propagation, and the Bethe Approximation
cited
2018
≈ 88%
Active Inference in Discrete State Spaces from First Principles
Patrick Kenny
2026
≈ 88%
Active Inference for Physical AI Agents -- An Engineering Perspective
Bert de Vries
2026
≈ 87%
Active inference and epistemic value
cited
2015
≈ 87%
Active inference and agency: optimal control without cost functions
cited
2012
≈ 87%
Active Inference and Epistemic Value in Graphical Models
Magnus Koudahl, Bart van Erp, Bert de Vries Thijs van de Laar
2022
≈ 87%
Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits
Shohei Wakayama and Nisar Ahmed
2023
≈ 87%
Deep Active Inference
Kai Ueltzh\"offer
2018
≈ 87%
Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability
Parvin Malekzadeh and Konstantinos N. Plataniotis
2024
≈ 87%
Active inference, Bayesian optimal design, and expected utility
Lancelot Da Costa, Thomas Parr, Karl Friston Noor Sajid
2021
≈ 87%
Inference of Affordances and Active Motor Control in Simulated Agents
Christian Gumbsch, Sebastian Otte, Martin V. Butz Fedor Scholz
2022
≈ 87%
A Minimal Active Inference Agent
Manuel Baltieri and Christopher L. Buckley Simon McGregor
2015
≈ 87%
Scene Construction, Visual Foraging, and Active Inference
cited
2016
≈ 87%
Realising Active Inference in Variational Message Passing: the Outcome-blind Certainty Seeker
Marek Grze\'s, Howard Bowman Th\'eophile Champion
2021
≈ 87%
Reframing the Expected Free Energy: Four Formulations and a Unification
Howard Bowman, Dimitrije Markovi\'c, Marek Grze\'s Th\'eophile Champion
2024
≈ 87%
A tale of two densities: active inference is enactive inference
in corpus
2020
≈ 87%
Active Inference, Evidence Accumulation, and the Urn Task
cited
2014
≈ 86%
Reinforcement Learning or Active Inference?
cited
2009
≈ 86%
Life as we know it
in corpus
2013
≈ 85%
Free-energy minimization in joint agent-environment systems: A niche construction perspective
cited
2018
≈ 85%
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes
cited
2014
≈ 85%
Free-energy and the brain
cited
2007
≈ 85%

+28 more

Similar preprints — Semantic Scholar

Cross-corpus bridges (10)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

aboutblank_kb
Active Inferenceframeworks/active-inference.md0.865
aboutblank_kb
Free Energy Principle And Active Inferenceframeworks/free-energy-principle-and-active-inference.md0.853
aboutblank_kb
Does the Free Energy Principle adequately explain morphogenesis and pattern formation in biological systems?questions/does-the-free-energy-principle-adequately-explain-morphogenesis.md0.807
aboutblank_kb
Bayesian Inference Model Of Morphogenesisframeworks/bayesian-inference-model-of-morphogenesis.md0.797
aboutblank_kb
Surprise Minimization Frameworkframeworks/surprise-minimization-framework.md0.794
aboutblank_kb
Free Energy Principleframeworks/free-energy-principle.md0.791
aboutblank_kb
Multi-Level Bayesian Inferenceframeworks/multi-level-bayesian-inference.md0.789
aboutblank_kb
Bayesian Mechanicsframeworks/bayesian-mechanics.md0.788
aboutblank_kb
Steve Frankthinkers/steve-frank.md0.787
aboutblank_kb
Susan Lindquistthinkers/susan-lindquist.md0.781