Active Inference: A Process Theory

ByKarl Friston ⓘ·Thomas FitzGerald·Francesco Rigoli·Philipp Schwartenbeck ⓘ·Giovanni Pezzulo ⓘCalifornia Institute for Machine Consciousness, Institute of Cognitive Sciences and Technologies, National Research Council, Rome + 6 more

DOI 10.1162/neco_a_00912 OpenAlex W2552810632

Active inference & free energy principle Active inference & free energy principle "the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics"

TL;DR

A single variational principle—minimizing variational free energy via gradient descent on a Markov decision process (MDP) generative model—is sufficient to derive neuronal dynamics that reproduce, without hand-tuning, more than 10 well-characterized empirical phenomena simultaneously: repetition suppression, mismatch negativity, violation responses (peaking ~200 ms in peristimulus time allowing 100 ms conduction delays), place-cell activity, phase precession, theta sequences, theta-gamma coupling (at ~4 Hz theta with nested gamma), evidence accumulation with race-to-bound stepping dynamics, and transfer of dopamine responses from unconditioned to conditioned stimuli. The method introduced is an active inference process theory grounded in belief propagation over discrete-time MDP generative models, where neuronal firing rates encode categorical state expectations, membrane potentials encode their logarithms, and postsynaptic currents correspond to free-energy gradients (state prediction errors). Simulations use outcomes sampled every 250 ms, eight hidden states over four locations and two contexts, and utilities of ±3 nats for rewarding versus unrewarding outcomes (~20-fold preference ratio). Dopamine is formalized as encoding precision (inverse temperature γ) with a postsynaptic time constant of ~1 s (κ₁/κ₂ = 1/64 per 16 ms iteration). Because a gradient descent constitutes a valid description of neuronal activity, variational free energy functions as a Lyapunov function for neuronal dynamics, implying that neural activity conforms to Hamilton's principle of least action and that a single imperative—free energy minimization—unifies perception, action, learning, and neuromodulatory signaling within one coherent process theory.

What to take away

1. A Markov decision process (MDP) generative model with eight hidden states, four control states, and utilities of ±3 nats yields neuronal dynamics—derived solely from gradient descent on variational free energy—that simultaneously reproduce repetition suppression, mismatch negativity, phase precession, theta-gamma coupling, place-cell activity, evidence accumulation, race-to-bound dynamics, and dopamine transfer without any hand-tuning of the generative model or inversion scheme.
2. The mismatch negativity emerges as the difference waveform between oddball and standard trials peaking at approximately 80 ms (or 180 ms allowing 100 ms conduction delays to occipital cortex), arising purely from differences in prior beliefs about context rather than differences in stimuli or actions.
3. Violation responses analogous to P300/N400 waveforms reach peak amplitude at approximately 150 ms (or 250 ms in peristimulus time with 100 ms conduction delays) when the agent is forced to remain at the cue location instead of proceeding to reward, reflecting protracted belief updating under policy violation.
4. Theta-gamma coupling emerges as a direct consequence of belief updating every 250 ms (theta cycle), because each observation induces phasic free-energy-gradient updates that necessarily contain high-frequency (gamma) components—no additional oscillatory mechanism is assumed.
5. Dopamine is formalized as encoding precision (inverse temperature γ) with a postsynaptic time constant of approximately 1 s (postsynaptic kernel ratio κ₁/κ₁ = 1/64 per 16 ms iteration), and phasic dopamine responses transfer from the unconditioned stimulus to the conditioned stimulus as context confidence increases across trials, directly reproducing Schultz-style conditioning data.
6. The paper raises an open hypothesis about hippocampal encoding frames: if neurons use a fixed trial-anchored reference frame (as the simulations implement) rather than a moving time frame, a subset of hippocampal units should show extra-classical place-cell activity encoding multi-location trajectories, which is a testable prediction distinguishable from standard place-cell physiology.
7. Policy selection is implemented as a softmax (Gibbs) function of negative expected free energy G(π) with inverse temperature γ, where the precision update β̇ = γ²ε_γ is driven by the prediction error on expected free energy, formally identifying precision updates with reward prediction error signals reported by dopaminergic neurons.
8. To replicate the simulations, a researcher should initialize prior concentration parameters d = 8 for the central location in each context, use relative utilities of ±3 nats for rewarding versus unrewarding outcomes, run gradient descent with step size Δt = 1/4 for 16 iterations per 250 ms epoch over 32 trials, and prune policies whose posterior probability falls below 1/128 to reproduce observed reaction-time reductions.
9. Expected free energy G(π,τ) decomposes into epistemic value (mutual information I(S_τ,O_τ|π) between hidden states and future outcomes, driving exploration) and extrinsic value (log evidence for preferred outcomes, driving exploitation), providing a formal unification of information-gain-based curiosity, KL control, risk-sensitivity, and expected utility theory as special cases of a single objective.
10. When extrinsic utilities are set to zero (all outcomes equally preferred), agents still exhibit structured epistemic foraging—consistently visiting the cue location to resolve context ambiguity and then avoiding the mildly ambiguous baited arms—demonstrating that epistemic value alone, without any reward signal, is sufficient to generate purposeful, non-random behavior.

Peer brief — for seminar discussion

Friston et al. (2017, Neural Computation 29:1–49) derive a neurobiologically grounded process theory by asking what dynamics emerge when gradient descent on variational free energy is applied to a discrete-time Markov decision process (MDP) generative model. The generative model is specified by a likelihood matrix A mapping eight hidden states (four locations × two contexts) to seven outcomes, transition matrices B(u) for four control states, Dirichlet priors over model parameters, and a softmax prior over policies parameterized by expected free energy G(π). All inference, action selection, learning, and precision encoding reduce to minimizing the same free-energy functional. The method introduced is active inference via belief propagation over this MDP, implemented as a gradient descent (equation 2.8) rather than exact Bayesian inference, with neuronal firing rates interpreted as categorical state expectations and membrane potentials as their logarithms. The load-bearing finding is that, without any hand-tuning of the generative model, simulations sampling outcomes every 250 ms reproduce more than ten empirically documented phenomena from a single imperative: repetition suppression and mismatch negativity (difference wave peaking ~180 ms in peristimulus time), violation responses analogous to P300 (~250 ms), place-cell selectivity exceeding 80% maximum firing at target locations, phase precession across theta cycles, theta-gamma coupling at ~4 Hz, stepping evidence accumulation matching race-to-bound dynamics, and transfer of phasic dopamine (precision) signals from unconditioned to conditioned stimuli across 32 simulated trials. Dopamine is identified with the precision parameter γ = 1/β, with a postsynaptic time constant of ~1 s (κ₁/κ₂ = 1/64 per 16 ms iteration). Because gradient descent on free energy is a valid description of dynamics, variational free energy is a Lyapunov function for neural activity, meaning neural dynamics conform to Hamilton's principle of least action. The implications are threefold. First, the exploit/explore trade-off is formally resolved: expected free energy decomposes into epistemic value (mutual information I(S_τ,O_τ|π)) and extrinsic value (log evidence for preferred outcomes), unifying KL control, Bayesian surprise, and expected utility theory as special cases. Second, dopaminergic precision encoding and associative synaptic plasticity (Hebbian learning with decay) fall out as necessary consequences of the same variational principle rather than requiring separate mechanisms. Third, an alternative between a fixed trial-anchored and a moving reference frame for hippocampal state encoding is identified as an empirically testable prediction: the fixed-frame scheme predicts extra-classical place cells encoding multi-location trajectories. The paper also sketches a 'sophisticated' extension in which agents recursively evaluate expected free energy over fictive future observations (Appendix F), contrasting it with the 'naive' scheme presented; the naive scheme is acknowledged to conflate policy-level uncertainty minimization with state-level uncertainty minimization in ways that may matter for planning and metacognition. The most pointed critique a critical reader would press is that the construct validity rests entirely on qualitative pattern-matching between simulated and empirical waveforms—no quantitative model comparison, no fitting to real neurophysiological time-series, and no formal null model is provided. Alternative approaches, such as direct Bayesian model selection over competing MDP architectures using empirical EEG or fMRI data (as in computational fMRI with SPM), could have grounded the claimed correspondences quantitatively. The breadth of phenomena reproduced is presented as evidence of explanatory power, but an equally valid interpretation is that the MDP framework is flexible enough to be near-unfalsifiable at this level of description, exactly the criticism of Bayesian brain theories that the paper explicitly cites (Bowers & Davis, 2012) and claims to address. The 250 ms epoch duration is stipulated rather than derived, and many of the quantitative matches (e.g., 80 ms MMN peak, 80% place-cell threshold) are sensitive to this free parameter in ways not explored systematically.

Findings (9)

Transfer of Dopamine Responses
Learning phenomenon reproduced by active inference: dopamine discharge shifts from unconditioned to conditioned stimuli.
Repetition Suppression
Neural phenomenon reproduced by active inference model: reduced response to repeated stimuli.
Race-to-Bound Dynamics
Decision-making neural dynamics reproduced by active inference; threshold crossing.
Evidence Accumulation
Decision-making phenomenon reproduced by active inference in parietal/prefrontal cortex.
Theta-Gamma Coupling
Hippocampal oscillatory phenomenon reproduced by active inference; phase-amplitude coupling.
Mismatch Negativity
ERP component reproduced by active inference: neural response to prediction violations.
Place Cell Activity
Hippocampal phenomenon reproduced by active inference model.
Phase Precession
Hippocampal neural coding phenomenon reproduced by active inference.
Theta Sequences
Hippocampal sequential activity pattern reproduced by active inference.

Claims (5)

Dopamine discharge encodes changes in expected free energy under posterior vs. prior policy beliefs, representing precision updates.
Links dopamine to precision modulation; reward prediction error reflects expected free energy changes.
All neuronal processing and action selection minimize variational free energy, unifying perception, action, and learning.
Fundamental assertion: single imperative (free energy minimization) explains diverse cognitive and neural phenomena.
Epistemic behavior (exploration) emerges from maximizing mutual information between hidden states and observations.
Formal mechanism for curiosity and information-seeking behavior derived from expected free energy.
Neuronal responses can be described as gradient descent on variational free energy.
Central claim: gradient descent on free energy is a valid process-level description of neural activity.
Behavior prescribed by active inference dynamics is approximately Bayes-optimal.
Process theory outcomes produce normatively sound decision-making.

Hypotheses (1)

Process theories can be derived from variational principles in a straightforward manner with biological plausibility.
Paper's core methodological hypothesis: gap between normative and process-level theories can be bridged.

Questions (2)

Can process theories implementing Bayesian models be derived and shown to explain empirical neuronal phenomena?
Motivating question: bridging normative Bayesian theory and testable neuroscience predictions.
Can neuronal responses be described as a gradient descent on variational free energy?
Central research question: whether process-level neural dynamics conform to free energy minimization.

Original abstract (expand)

This article describes a process theory based on active inference and belief propagation. Starting from the premise that all neuronal processing (and action selection) can be explained by maximizing Bayesian model evidence-or minimizing variational free energy-we ask whether neuronal responses can be described as a gradient descent on variational free energy. Using a standard (Markov decision process) generative model, we derive the neuronal dynamics implicit in this description and reproduce a remarkable range of well-characterized neuronal phenomena. These include repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta-gamma coupling, evidence accumulation, race-to-bound dynamics, and transfer of dopamine responses. Furthermore, the (approximately Bayes' optimal) behavior prescribed by these dynamics has a degree of face validity, providing a formal explanation for reward seeking, context learning, and epistemic foraging. Technically, the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton's principle of least action.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Neural dynamics under active inference: plausibility and efficiency of information processing
Thomas Parr, Biswa Sengupta, Karl Friston Lancelot Da Costa
2021
≈ 91%
Active inference on discrete state-spaces: a synthesis
in corpus
2020
≈ 91%
Active inference on discrete state-spaces: a synthesis
Thomas Parr, Noor Sajid, Sebastijan Veselic, Victorita Neacsu, Karl Friston Lancelot Da Costa
2021
≈ 89%
Realising Active Inference in Variational Message Passing: the Outcome-blind Certainty Seeker
Marek Grze\'s, Howard Bowman Th\'eophile Champion
2021
≈ 89%
A Minimal Active Inference Agent
Manuel Baltieri and Christopher L. Buckley Simon McGregor
2015
≈ 88%
Deep Active Inference
Kai Ueltzh\"offer
2018
≈ 88%
A Free energy principle for the brain (lecture summary)
in corpus
2008
≈ 88%
Active Inference, Curiosity and Insight
in corpus
2017
≈ 88%
Active Inference for Physical AI Agents -- An Engineering Perspective
Bert de Vries
2026
≈ 88%
Realising Synthetic Active Inference Agents, Part II: Variational Message Updates
Magnus Koudahl and Bert de Vries Thijs van de Laar
2025
≈ 87%
The anatomy of choice: dopamine and decision-making
cited
2014
≈ 87%
Life as we know it
cited
in corpus
2013
≈ 82%
Active inference: demystified and compared
in corpus
2021
≈ 87%
A tale of two densities: active inference is enactive inference
in corpus
2020
≈ 87%
Kalman filters as the steady-state solution of gradient descent on variational free energy
Manuel Baltieri and Takuya Isomura
2021
≈ 87%
Active Inference is a Subtype of Variational Inference
Mykola Lukashchuk Wouter W. L. Nuijten
2025
≈ 87%
Active inference for action-unaware agents
Keisuke Suzuki, Ryota Kanai, Manuel Baltieri Filippo Torresan
2025
≈ 87%
Deep Active Inference for Partially Observable MDPs
Pablo Lanillos Otto van der Himst
2021
≈ 87%
Learning Perception and Planning with Deep Active Inference
Tim Verbelen, Johannes Nauta, Cedric De Boom and Bart Dhoedt Ozan \c{C}atal
2020
≈ 87%
Deriving time-averaged active inference from control principles
Jordan Theriault, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen Quigley Eli Sennesh
2022
≈ 87%
Contrastive Active Inference
Pietro Mazzaglia and Tim Verbelen and Bart Dhoedt
2024
≈ 86%
Active Inference and Intentional Behaviour
Tommaso Salvatori, Takuya Isomura, Alexander Tschantz, Alex Kiefer, Tim Verbelen, Magnus Koudahl, Aswin Paul, Thomas Parr, Adeel Razi, Brett Kagan, Christopher L. Buckley, and Maxwell J. D. Ramstead Karl J. Friston
2023
≈ 86%
Active Inference, Evidence Accumulation, and the Urn Task
cited
2014
≈ 86%
Active Inference, homeostatic regulation and adaptive behavioural control
cited
2015
≈ 85%
The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes
cited
2014
≈ 84%
Active inference and epistemic value
cited
2015
≈ 84%
Dopamine, reward learning, and active inference
cited
2015
≈ 84%
Active inference and learning
cited
2016
≈ 84%
Scene Construction, Visual Foraging, and Active Inference
cited
2016
≈ 84%
The anatomy of choice: active inference and agency
cited
2013
≈ 83%

+28 more

Similar preprints — Semantic Scholar

Cited by (5)

Active Inference, Curiosity and Insight
Minimizing expected variational free energy under a discrete-state Markov decision process generative model is sufficient to produce curiosity, epistemic learning, and insight without any additional m
Active inference: demystified and compared
Active inference agents operating under expected free energy minimization achieve 98.90 [98.00, 99.79] average score in a non-stationary FrozenLake OpenAI gym environment, compared to 64.39 [60.33, 68
Towards a computational phenomenology of mental action: modelling meta-awareness and attentional control with deep parametric active inference
A tale of two densities: active inference is enactive inference
Ramstead, Kirchhoff, and Friston argue that generative models in active inference under the free energy principle (FEP) are control systems—not structural representations—and that this distinction has
Active inference on discrete state-spaces: a synthesis
Active inference on discrete state-spaces, formalized as partially observable Markov decision processes (POMDPs) with likelihood matrix A, transition matrix B, and prior D, unifies perception, plannin

Cross-corpus bridges (4)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

aboutblank_kb
Active Inferenceframeworks/active-inference.md0.890
aboutblank_kb
Free Energy Principle And Active Inferenceframeworks/free-energy-principle-and-active-inference.md0.857
aboutblank_kb
Does the Free Energy Principle adequately explain morphogenesis and pattern formation in biological systems?questions/does-the-free-energy-principle-adequately-explain-morphogenesis.md0.809
aboutblank_kb
Free Energy Principleframeworks/free-energy-principle.md0.795