paper:dacosta-2020-active-inference-discreteActive inference on discrete state-spaces: a synthesis
TL;DR
Active inference on discrete state-spaces, formalized as partially observable Markov decision processes (POMDPs) with likelihood matrix A, transition matrix B, and prior D, unifies perception, planning, decision-making, learning, and structure learning under two objective functions: variational free energy (an upper bound on surprise minimized during state estimation) and expected free energy G(π) (minimized during policy selection). The synthesis derives neuronal dynamics from first principles via gradient descent on free energy, showing that state estimation corresponds to a softmax function of accumulated prediction errors—equations interpretable as membrane potentials mapping to firing rates—and that these dynamics coincide exactly with variational message passing, while the Bethe approximation yields belief propagation. Policy selection follows Q(π) = σ(−G(π)), where G decomposes into risk (KL divergence between predicted and preferred states) and ambiguity (expected entropy of outcomes given states), formally subsuming KL control, expected utility theory, and optimal Bayesian design as special cases. Learning of A follows Dirichlet parameter accumulation **a** = a + Σ(oτ ⊗ sτ), which is formally equivalent to Hebbian plasticity, while structure learning proceeds via Bayesian model reduction (BMR) for simplification and Bayesian model expansion for concept acquisition, with the marginal approximation implemented in `spm_MDP_VB_X.m` identified as the most biologically plausible free energy approximation. The paper argues this implies that biological cognition—from saccadic sampling at ~4 Hz to dopaminergic precision encoding γ—is fully accountable as free energy minimization, and that the outstanding challenge is identifying the evidence-maximizing generative model an agent actually employs, which would constitute a complete structure learning roadmap.
What to take away
- 1. Active inference on discrete state-spaces formalizes agents as POMDPs with three core matrices—likelihood A, transition B, and initial-state prior D—whose inversion via variational free energy minimization constitutes perception.
- 2. Policy selection is governed by Q(π) = σ(−G(π)), where expected free energy G decomposes into a risk term (KL divergence between predicted and preferred states) and an ambiguity term (expected entropy of outcomes given hidden states), formally unifying goal-directed and exploratory behavior.
- 3. The neuronal dynamics for state estimation—a gradient descent producing **s**πτ = σ(v), v̇ = −∇F—are mathematically equivalent to variational message passing, and switching to the Bethe approximation recovers belief propagation, establishing a formal bridge between active inference and message-passing accounts of neural computation.
- 4. Learning the likelihood matrix A follows the Dirichlet update **a** = a + Σ(oτ ⊗ **s**τ) accumulated over T timesteps per trial, which is formally identical to Hebbian/associative plasticity and increases agent confidence monotonically with experience.
- 5. The expected free energy G subsumes at least five existing theoretical frameworks depending on which uncertainty terms are removed: information gain (no preferences), KL control (no ambiguity), risk-sensitive control (β = 0 Gibbs energy), expected utility theory (no ambiguity or intrinsic value), and the maximum entropy principle (unambiguous world with uninformative priors).
- 6. The marginal approximation to free energy, implemented in `spm_MDP_VB_X.m` rather than the mean-field factorization detailed in the paper's main derivations, currently stands as the most biologically plausible approximation because it retains neuronal interpretability while approaching the accuracy of the Bethe approximation.
- 7. Bayesian model reduction (BMR) enables post-hoc structure learning by analytically comparing reduced versus full model evidence via log P̃(o) − log P(o) = log E_{P(ν|o)}[P̃(ν)/P(ν)], providing a biologically interpretable mechanism for synaptic pruning and emulating sleep-like consolidation.
- 8. The precision parameter γ multiplying G(π) inside the softmax encodes dopaminergic confidence in policy selection, a correspondence supported by fMRI validation of dopaminergic midbrain encoding of expected certainty (Schwartenbeck et al., Cerebral Cortex 25(10):3434–3445, 2015).
- 9. An open question the paper raises is how biological agents tractably search deep policy trees: while Occam-window pruning reduces evaluation cost, it cannot scale to long temporal horizons, and hierarchical (semi-Markovian) generative models with nested timescales are proposed but not fully characterized as a solution.
- 10. To replicate the perception update, a researcher can implement equations (8)–(9): compute free energy gradient ∇F per policy using matrices A, B, D and observed outcomes, accumulate in a leaky integrator v, and pass through softmax to obtain posterior state beliefs **s**πτ, iterating within each observation epoch at a timescale faster than the ~4 Hz saccadic sampling rate.
Peer brief — for seminar discussion
Da Costa and colleagues (2020, Journal of Mathematical Psychology) provide a complete mathematical synthesis of active inference on discrete state-space generative models—specifically POMDPs parameterized by likelihood matrix A, transition matrix B, and initial-state prior D—deriving the full process theory from first principles rather than presenting it as an informal collection of update rules. The paper introduces no single novel algorithm but rather the consolidated derivation itself, which it calls the discrete-state active inference process theory, tracing every update equation from the variational free energy functional through to biologically interpretable neuronal dynamics. An alternative synthesis strategy the paper could have used is the Bethe free energy approximation throughout (rather than the structured mean-field factorization in Equation 4), which the authors acknowledge would yield belief propagation dynamics and is arguably more accurate, though they defer to mean-field for didactic clarity while noting that `spm_MDP_VB_X.m` already uses the marginal approximation as a compromise. The load-bearing finding is a chain of equivalences: (1) state estimation via gradient descent on variational free energy F produces dynamics mathematically identical to variational message passing; (2) policy selection via Q(π) = σ(−G(π)) produces a quantity G that decomposes into risk and ambiguity, subsuming KL control, expected utility, intrinsic motivation, and optimal Bayesian design as special cases; and (3) learning the A matrix via the Dirichlet accumulation rule **a** = a + Σ(oτ ⊗ **s**τ) over T timesteps is formally identical to Hebbian plasticity. The precision parameter γ scaling G(π) maps to dopaminergic firing, a correspondence validated empirically in Cerebral Cortex 25(10):3434–3445. Visual saccadic sampling occurs at approximately 4 Hz, and the paper argues faster within-timestep neuronal dynamics (consistent with gamma/beta bursts observed in working memory studies) implement the gradient descent in peristimulus time. The implied prediction is that a complete structure learning roadmap—combining Bayesian model reduction for pruning and Bayesian model expansion for concept acquisition—would identify the evidence-maximizing generative model entailed by any biological agent purely from behavioral data, thereby enabling accurate in-silico replication of that agent's electrophysiology. A critical reader would push back on the biological plausibility claim most directly: the paper asserts that the marginal approximation in `spm_MDP_VB_X.m` is the most biologically plausible free energy scheme, but this claim rests on face validity (synthesized ERP responses resembling mismatch negativity, theta-gamma coupling, etc.) rather than rigorous quantitative comparison against empirical neural data with competing models held constant. The paper acknowledges that Bayesian model comparison across alternative free energy approximations—mean-field, Bethe, and marginal—using actual electrophysiological recordings has not been performed, making the plausibility argument largely circular: the framework generates signals that look like the data it was designed to explain, without a fully pre-registered or out-of-sample test against, say, dynamic causal modeling fits or reinforcement learning baselines on the same datasets.
Methods (6)
- Belief PropagationInference mechanism underlying active inference; updates posterior beliefs via gradient descent on free energy.
- Dirichlet Parameter AccumulationLearning rule for updating Dirichlet beliefs about likelihood matrix A by adding outer products of observations and state estimates.
- Hebbian Plasticity UpdateSynaptic update rule that is formally identical to associative learning; used for learning A.
- Mean-Field ApproximationVariational technique used in active inference to tractably compute posterior beliefs.
- Occam Window PruningPruning policy trees by discarding policies whose expected free energy exceeds that of the best by a threshold.
- Variational BayesMathematical framework for approximating posterior beliefs; converts exact Bayesian inference into optimization.
Frameworks (14)
- Active InferenceFoundational framework by Karl Friston; the paper extends it to three hierarchical levels for modeling meta-awareness.
- Bayesian Brain HypothesisNormative theory proposing biological systems perform approximate Bayesian inference through free energy minimization.
- Bayesian Decision TheoryFramework for maximizing expected utility under uncertainty.
- Bethe ApproximationFree energy approximation using two-node marginals.
- Expected Utility TheoryEconomic framework for decision-making under risk.
- Free Energy PrincipleA foundational variational principle from statistical physics that formalizes how self-organizing systems maintain structural integrity and adapt to their environment by minimizing free energy—a mathematical bound on surprise or prediction error. Originally developed by Karl Friston, the framework unifies action, perception, and learning as processes of active inference, where systems both update internal models of the world and act upon it to reduce the divergence between predictions and observations.
- KL Control / Risk-Sensitive ControlControl approach that minimizes KL divergence to a target distribution; underlies risk term in expected free energy.
- Marginal Free Energy ApproximationBiologically plausible approximation lying between mean-field and Bethe approximations.
- Optimal Bayesian DesignSelecting actions to maximize expected information gain.
- Optimal Control TheoryDesign of controllers to minimize a cost function.
- Partially Observable Markov Decision Process (POMDP)Modeling framework for discrete state-space decision-making under uncertainty, used as generative model in active inference.
- Predictive Processing
- Reinforcement LearningAlternative framework for agent behavior; based on reward maximization rather than free energy minimization.
- Variational Message PassingAlgorithm for approximate Bayesian inference based on mean-field approximation.
Claims (35)
- Under the Markov blanket assumption together with NESS, a generalised synchrony appears, such that the dynamics of internal states can be cast as performing inference over external states via minimisation of variational free energy.
Key theoretical claim linking active inference to physics in Section 2.
- Agents perceive by minimizing variational free energy to ensure model consistency with past observations and act by minimizing expected free energy to make future sensations consistent with preferences.
Formalization of perception-action cycle integrating inference and decision-making.
- Deep temporal models enable long-term policies, modelling slow transitions among hidden states at higher levels in the hierarchy, to contextualise faster state transitions at subordinate levels.
Describes hierarchical planning in Section 6.4.
- Structure learning via Bayesian model reduction has a clear biological interpretation in terms of synaptic decay and switching off certain synaptic connections, reminiscent of REM sleep.
Biological interpretation of Bayesian model reduction.
- Active inference describes the dynamics of systems that persist at non-equilibrium steady-state and that can be statistically segregated from their environment via a Markov blanket.
Sets the theoretical grounding in Section 2.
- In discrete state-space models, agents select from different possible policies to realise their preferences and minimise the surprise that they expect to encounter in the future.
Summarises discrete active inference, Section 2.
- Active inference postulates that agents achieve survival by optimising two complementary objective functions, a variational free energy and an expected free energy.
Core claim of active inference stated in Section 2.
- Winner take-all architectures of decision-making are already commonplace in computational neuroscience, and the softmax function provides a smooth approximation.
Neural plausibility argument for softmax policy selection.
- Expected free energy decomposes into risk (exploitation) and ambiguity (exploration) terms, providing optimal balance between goal-seeking and novelty-seeking.
Key insight into structure of decision-making; explains intrinsic motivation and curiosity.
- The temperature parameter regulating precision of policy selection has a clear biological interpretation in terms of confidence encoded in dopaminergic firing.
Links precision to dopamine, Section 6.3.
Hypotheses (2)
- Biological agents use a process theory of active inference where neuronal dynamics correspond to variational free energy minimisation for perception and expected free energy minimisation for action.
The core process theory hypothesis set up in the paper.
- If a system attains a general steady-state, it will appear to behave in a Bayes optimal fashion, both in terms of optimal Bayesian design (exploration) and Bayesian decision theory (exploitation).
Corollary 3 in Appendix B derived from steady-state assumptions.
Questions (5)
- How can active inference be scaled to complex models with many degrees of freedom while maintaining tractable inference?
Another scaling question from Discussion.
- What mechanisms allow biological agents to effectively search deep policy trees when planning into the future?
Scaling challenge for active inference.
- How do biological organisms evolve their generative model to account for new sensory observations?
Structure learning challenge in Discussion.
- What is the generative model that best explains observable data from a behaving agent?
Central challenge for active inference stated in Discussion.
- How do biological agents reduce large policy spaces to tractable subspaces?
Open question regarding computational scaling of policy search.
Original abstract (expand)
Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Active Inference: A Process Theorycitedin corpus2017≈ 91%
- Active Inference, Curiosity and Insightcitedin corpus2017≈ 90%
- ≈ 89%
- Active inference on discrete state-spaces: a synthesisThomas Parr, Noor Sajid, Sebastijan Veselic, Victorita Neacsu, Karl Friston Lancelot Da Costa2021≈ 89%
- Active inference and artificial reasoningLancelot Da Costa, Alexander Tschantz, Conor Heins, Christopher Buckley, Tim Verbelen, Thomas Parr Karl Friston2025≈ 89%
- ≈ 88%
- Active inference: demystified and comparedin corpus2021≈ 88%
- Neural dynamics under active inference: plausibility and efficiency of information processingThomas Parr, Biswa Sengupta, Karl Friston Lancelot Da Costa2021≈ 88%
- ≈ 88%
- ≈ 88%
- ≈ 87%
- Active inference and epistemic valuecited2015≈ 87%
- ≈ 87%
- Active Inference and Epistemic Value in Graphical ModelsMagnus Koudahl, Bart van Erp, Bert de Vries Thijs van de Laar2022≈ 87%
- Active Inference for Autonomous Decision-Making with Contextual Multi-Armed BanditsShohei Wakayama and Nisar Ahmed2023≈ 87%
- ≈ 87%
- Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observabilityParvin Malekzadeh and Konstantinos N. Plataniotis2024≈ 87%
- Active inference, Bayesian optimal design, and expected utilityLancelot Da Costa, Thomas Parr, Karl Friston Noor Sajid2021≈ 87%
- Inference of Affordances and Active Motor Control in Simulated AgentsChristian Gumbsch, Sebastian Otte, Martin V. Butz Fedor Scholz2022≈ 87%
- ≈ 87%
- ≈ 87%
- Realising Active Inference in Variational Message Passing: the Outcome-blind Certainty SeekerMarek Grze\'s, Howard Bowman Th\'eophile Champion2021≈ 87%
- Reframing the Expected Free Energy: Four Formulations and a UnificationHoward Bowman, Dimitrije Markovi\'c, Marek Grze\'s Th\'eophile Champion2024≈ 87%
- ≈ 87%
- ≈ 86%
- ≈ 86%
- Life as we know itin corpus2013≈ 85%
- Free-energy minimization in joint agent-environment systems: A niche construction perspectivecited2018≈ 85%
- ≈ 85%
- Free-energy and the braincited2007≈ 85%
+28 more
Similar preprints — Semantic Scholar
Cross-corpus bridges (10)
same_concept_as · Nomic cosineExternal markdown files that talk about the same concept as this entity.
- aboutblank_kbActive Inferenceframeworks/active-inference.md0.865
- aboutblank_kbFree Energy Principle And Active Inferenceframeworks/free-energy-principle-and-active-inference.md0.853
- aboutblank_kbDoes the Free Energy Principle adequately explain morphogenesis and pattern formation in biological systems?questions/does-the-free-energy-principle-adequately-explain-morphogenesis.md0.807
- aboutblank_kbBayesian Inference Model Of Morphogenesisframeworks/bayesian-inference-model-of-morphogenesis.md0.797
- aboutblank_kbSurprise Minimization Frameworkframeworks/surprise-minimization-framework.md0.794
- aboutblank_kbFree Energy Principleframeworks/free-energy-principle.md0.791
- aboutblank_kbMulti-Level Bayesian Inferenceframeworks/multi-level-bayesian-inference.md0.789
- aboutblank_kbBayesian Mechanicsframeworks/bayesian-mechanics.md0.788
- aboutblank_kbSteve Frankthinkers/steve-frank.md0.787
- aboutblank_kbSusan Lindquistthinkers/susan-lindquist.md0.781