paper
active
2017
1,168
paper:friston-2017-active-inference-process-theory

Active Inference: A Process Theory

TL;DR

A single variational principle—minimizing variational free energy via gradient descent on a Markov decision process (MDP) generative model—is sufficient to derive neuronal dynamics that reproduce, without hand-tuning, more than 10 well-characterized empirical phenomena simultaneously: repetition suppression, mismatch negativity, violation responses (peaking ~200 ms in peristimulus time allowing 100 ms conduction delays), place-cell activity, phase precession, theta sequences, theta-gamma coupling (at ~4 Hz theta with nested gamma), evidence accumulation with race-to-bound stepping dynamics, and transfer of dopamine responses from unconditioned to conditioned stimuli. The method introduced is an active inference process theory grounded in belief propagation over discrete-time MDP generative models, where neuronal firing rates encode categorical state expectations, membrane potentials encode their logarithms, and postsynaptic currents correspond to free-energy gradients (state prediction errors). Simulations use outcomes sampled every 250 ms, eight hidden states over four locations and two contexts, and utilities of ±3 nats for rewarding versus unrewarding outcomes (~20-fold preference ratio). Dopamine is formalized as encoding precision (inverse temperature γ) with a postsynaptic time constant of ~1 s (κ₁/κ₂ = 1/64 per 16 ms iteration). Because a gradient descent constitutes a valid description of neuronal activity, variational free energy functions as a Lyapunov function for neuronal dynamics, implying that neural activity conforms to Hamilton's principle of least action and that a single imperative—free energy minimization—unifies perception, action, learning, and neuromodulatory signaling within one coherent process theory.

What to take away

  1. 1. A Markov decision process (MDP) generative model with eight hidden states, four control states, and utilities of ±3 nats yields neuronal dynamics—derived solely from gradient descent on variational free energy—that simultaneously reproduce repetition suppression, mismatch negativity, phase precession, theta-gamma coupling, place-cell activity, evidence accumulation, race-to-bound dynamics, and dopamine transfer without any hand-tuning of the generative model or inversion scheme.
  2. 2. The mismatch negativity emerges as the difference waveform between oddball and standard trials peaking at approximately 80 ms (or 180 ms allowing 100 ms conduction delays to occipital cortex), arising purely from differences in prior beliefs about context rather than differences in stimuli or actions.
  3. 3. Violation responses analogous to P300/N400 waveforms reach peak amplitude at approximately 150 ms (or 250 ms in peristimulus time with 100 ms conduction delays) when the agent is forced to remain at the cue location instead of proceeding to reward, reflecting protracted belief updating under policy violation.
  4. 4. Theta-gamma coupling emerges as a direct consequence of belief updating every 250 ms (theta cycle), because each observation induces phasic free-energy-gradient updates that necessarily contain high-frequency (gamma) components—no additional oscillatory mechanism is assumed.
  5. 5. Dopamine is formalized as encoding precision (inverse temperature γ) with a postsynaptic time constant of approximately 1 s (postsynaptic kernel ratio κ₁/κ₁ = 1/64 per 16 ms iteration), and phasic dopamine responses transfer from the unconditioned stimulus to the conditioned stimulus as context confidence increases across trials, directly reproducing Schultz-style conditioning data.
  6. 6. The paper raises an open hypothesis about hippocampal encoding frames: if neurons use a fixed trial-anchored reference frame (as the simulations implement) rather than a moving time frame, a subset of hippocampal units should show extra-classical place-cell activity encoding multi-location trajectories, which is a testable prediction distinguishable from standard place-cell physiology.
  7. 7. Policy selection is implemented as a softmax (Gibbs) function of negative expected free energy G(π) with inverse temperature γ, where the precision update β̇ = γ²ε_γ is driven by the prediction error on expected free energy, formally identifying precision updates with reward prediction error signals reported by dopaminergic neurons.
  8. 8. To replicate the simulations, a researcher should initialize prior concentration parameters d = 8 for the central location in each context, use relative utilities of ±3 nats for rewarding versus unrewarding outcomes, run gradient descent with step size Δt = 1/4 for 16 iterations per 250 ms epoch over 32 trials, and prune policies whose posterior probability falls below 1/128 to reproduce observed reaction-time reductions.
  9. 9. Expected free energy G(π,τ) decomposes into epistemic value (mutual information I(S_τ,O_τ|π) between hidden states and future outcomes, driving exploration) and extrinsic value (log evidence for preferred outcomes, driving exploitation), providing a formal unification of information-gain-based curiosity, KL control, risk-sensitivity, and expected utility theory as special cases of a single objective.
  10. 10. When extrinsic utilities are set to zero (all outcomes equally preferred), agents still exhibit structured epistemic foraging—consistently visiting the cue location to resolve context ambiguity and then avoiding the mildly ambiguous baited arms—demonstrating that epistemic value alone, without any reward signal, is sufficient to generate purposeful, non-random behavior.

Peer brief — for seminar discussion

Friston et al. (2017, Neural Computation 29:1–49) derive a neurobiologically grounded process theory by asking what dynamics emerge when gradient descent on variational free energy is applied to a discrete-time Markov decision process (MDP) generative model. The generative model is specified by a likelihood matrix A mapping eight hidden states (four locations × two contexts) to seven outcomes, transition matrices B(u) for four control states, Dirichlet priors over model parameters, and a softmax prior over policies parameterized by expected free energy G(π). All inference, action selection, learning, and precision encoding reduce to minimizing the same free-energy functional. The method introduced is active inference via belief propagation over this MDP, implemented as a gradient descent (equation 2.8) rather than exact Bayesian inference, with neuronal firing rates interpreted as categorical state expectations and membrane potentials as their logarithms. The load-bearing finding is that, without any hand-tuning of the generative model, simulations sampling outcomes every 250 ms reproduce more than ten empirically documented phenomena from a single imperative: repetition suppression and mismatch negativity (difference wave peaking ~180 ms in peristimulus time), violation responses analogous to P300 (~250 ms), place-cell selectivity exceeding 80% maximum firing at target locations, phase precession across theta cycles, theta-gamma coupling at ~4 Hz, stepping evidence accumulation matching race-to-bound dynamics, and transfer of phasic dopamine (precision) signals from unconditioned to conditioned stimuli across 32 simulated trials. Dopamine is identified with the precision parameter γ = 1/β, with a postsynaptic time constant of ~1 s (κ₁/κ₂ = 1/64 per 16 ms iteration). Because gradient descent on free energy is a valid description of dynamics, variational free energy is a Lyapunov function for neural activity, meaning neural dynamics conform to Hamilton's principle of least action. The implications are threefold. First, the exploit/explore trade-off is formally resolved: expected free energy decomposes into epistemic value (mutual information I(S_τ,O_τ|π)) and extrinsic value (log evidence for preferred outcomes), unifying KL control, Bayesian surprise, and expected utility theory as special cases. Second, dopaminergic precision encoding and associative synaptic plasticity (Hebbian learning with decay) fall out as necessary consequences of the same variational principle rather than requiring separate mechanisms. Third, an alternative between a fixed trial-anchored and a moving reference frame for hippocampal state encoding is identified as an empirically testable prediction: the fixed-frame scheme predicts extra-classical place cells encoding multi-location trajectories. The paper also sketches a 'sophisticated' extension in which agents recursively evaluate expected free energy over fictive future observations (Appendix F), contrasting it with the 'naive' scheme presented; the naive scheme is acknowledged to conflate policy-level uncertainty minimization with state-level uncertainty minimization in ways that may matter for planning and metacognition. The most pointed critique a critical reader would press is that the construct validity rests entirely on qualitative pattern-matching between simulated and empirical waveforms—no quantitative model comparison, no fitting to real neurophysiological time-series, and no formal null model is provided. Alternative approaches, such as direct Bayesian model selection over competing MDP architectures using empirical EEG or fMRI data (as in computational fMRI with SPM), could have grounded the claimed correspondences quantitatively. The breadth of phenomena reproduced is presented as evidence of explanatory power, but an equally valid interpretation is that the MDP framework is flexible enough to be near-unfalsifiable at this level of description, exactly the criticism of Bayesian brain theories that the paper explicitly cites (Bowers & Davis, 2012) and claims to address. The 250 ms epoch duration is stipulated rather than derived, and many of the quantitative matches (e.g., 80 ms MMN peak, 80% place-cell threshold) are sensitive to this free parameter in ways not explored systematically.

Findings (9)

  • Transfer of Dopamine Responses

    Learning phenomenon reproduced by active inference: dopamine discharge shifts from unconditioned to conditioned stimuli.

  • Repetition Suppression

    Neural phenomenon reproduced by active inference model: reduced response to repeated stimuli.

  • Race-to-Bound Dynamics

    Decision-making neural dynamics reproduced by active inference; threshold crossing.

  • Evidence Accumulation

    Decision-making phenomenon reproduced by active inference in parietal/prefrontal cortex.

  • Theta-Gamma Coupling

    Hippocampal oscillatory phenomenon reproduced by active inference; phase-amplitude coupling.

  • Mismatch Negativity

    ERP component reproduced by active inference: neural response to prediction violations.

  • Place Cell Activity

    Hippocampal phenomenon reproduced by active inference model.

  • Phase Precession

    Hippocampal neural coding phenomenon reproduced by active inference.

  • Theta Sequences

    Hippocampal sequential activity pattern reproduced by active inference.

Claims (5)

Hypotheses (1)

Questions (2)

Original abstract (expand)

This article describes a process theory based on active inference and belief propagation. Starting from the premise that all neuronal processing (and action selection) can be explained by maximizing Bayesian model evidence-or minimizing variational free energy-we ask whether neuronal responses can be described as a gradient descent on variational free energy. Using a standard (Markov decision process) generative model, we derive the neuronal dynamics implicit in this description and reproduce a remarkable range of well-characterized neuronal phenomena. These include repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta-gamma coupling, evidence accumulation, race-to-bound dynamics, and transfer of dopamine responses. Furthermore, the (approximately Bayes' optimal) behavior prescribed by these dynamics has a degree of face validity, providing a formal explanation for reward seeking, context learning, and epistemic foraging. Technically, the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton's principle of least action.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

+28 more

Similar preprints — Semantic Scholar

Cited by (5)

Cross-corpus bridges (4)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

  • aboutblank_kb
    Active Inferenceframeworks/active-inference.md0.890
  • aboutblank_kb
    Free Energy Principle And Active Inferenceframeworks/free-energy-principle-and-active-inference.md0.857
  • aboutblank_kb
    Does the Free Energy Principle adequately explain morphogenesis and pattern formation in biological systems?questions/does-the-free-energy-principle-adequately-explain-morphogenesis.md0.809
  • aboutblank_kb
    Free Energy Principleframeworks/free-energy-principle.md0.795