Active Inference with a Self-Prior in the Mirror-Mark Task

ByDongmin Kim·Hoshinori Kanazawa ⓘ·Yasuo KuniyoshiLaboratory for Intelligent Systems and Informatics, University of Tokyo

DOI 10.48550/arxiv.2604.09673 arXiv 2604.09673 OpenAlex W7154335355

Self Awareness Mirror Self-Recognition Test Cross-Modal Sampling Self-Prior Sticker Removal Success Criterion STORM

TL;DR

Spontaneous mark-directed behavior in the mirror-mark task emerges from a single internal mechanism—the self-prior—combined with expected free energy minimization, without any external reward signal. A simulated infant model built on the EMFANT platform (MuJoCo-based, 12 DoF) achieves sticker removal in approximately 70% of evaluation episodes across 80 runs, using only 64×64×3 RGB vision and 12-dimensional proprioception, with tactile input deliberately withheld. The self-prior, implemented as a GPT-like Transformer that autoregressively models joint distributions over 32-class discrete latent variables, is trained exclusively on sticker-free experiences; when a sticker is present, the latent state falls outside the learned high-density region, and the policy—built on STORM's DreamerV3-like world model extended with active inference—selects actions that drive the state back toward familiar self-experience. Across all 80 evaluations, expected free energy after sticker removal (67.33 ± 8.94) was significantly lower than before removal (79.33 ± 4.34; Wilcoxon signed-rank p = 6.33 × 10⁻⁹), and cross-modal sampling confirmed that visual self-appearance can be recovered from proprioception alone, establishing the self-prior as a probabilistic body schema. The paper argues this constitutes a computational implementation of Mitchell's inductive theory of mirror self-recognition and that the free energy principle can serve as a unifying, parsimonious hypothesis for investigating the developmental origins of self-awareness.

What to take away

1. A simulated 12-DoF infant agent on the EMFANT/MuJoCo platform removes a facial sticker in approximately 70% of evaluation episodes by episode 500k, up from roughly 20% early in training, using only vision and proprioception with no tactile input.
2. Expected free energy after sticker removal (67.33 ± 8.94) is significantly lower than before removal (79.33 ± 4.34) across all 80 evaluations, confirmed by a Wilcoxon signed-rank test at p = 6.33 × 10⁻⁹.
3. The self-prior is implemented as a GPT-like autoregressive Transformer that models the joint distribution of a 32-class × 32-variable discrete latent state trained exclusively on sticker-free episodes, causing sticker-bearing observations to receive low probability and thus high expected free energy.
4. The architecture extends STORM (a DreamerV3-like Transformer-based world model) by replacing external reward with expected free energy computed from the self-prior, omitting STORM's reward predictor and continuation predictor entirely.
5. Training proceeds in three progressive stages: world model training starts after episode 100, self-prior training after episode 120, and policy training after episode 140, with policy rollouts imagined over H = 16 future steps.
6. Self-prior training uses only approximately 5% sticker-bearing episodes from collected data, deliberately constructing a prior that represents the sticker-free self so that any sticker induces a detectable density mismatch.
7. Cross-modal sampling demonstrates that the self-prior captures visual–proprioceptive associations: a visual self-appearance can be reconstructed from proprioceptive input alone, functioning as a probabilistic body schema consistent with Paillard/Gallagher body schema theories.
8. The paper raises an open question about whether occlusion of the sticker by the agent's own hand and kinematic reachability limits constitute the primary ceiling on performance, implying that the 70% success rate may not reflect a cognitive ceiling but a physical one.
9. To replicate the methodology, one should sample the replay buffer such that sticker-bearing episodes constitute only ~5% of self-prior training data while using the full dataset (50/50 sticker/no-sticker, 50/50 random/policy mix) for world model and policy training, with policy gradient clipping via ZClip and AdamW for world model and self-prior versus Adam for policy and value networks.
10. The paper predicts that integrating tactile modality—motivated by Chinn et al.'s finding that tactile target experience accelerates mirror self-recognition in infants—would improve learning efficiency and is identified as the primary direction for future work.

Peer brief — for seminar discussion

Kim, Kanazawa, and Kuniyoshi model the mirror-mark task in a 12-DoF simulated infant on the EMFANT platform (MuJoCo physics) by combining a transformer-based self-prior with active inference, training for 500k steps over 50,000 episodes and evaluating across 80 runs (8 seeds × 10 runs). The core method introduced is the self-prior: a GPT-like autoregressive Transformer that learns the density of sticker-free multisensory latent states, implemented within a STORM/DreamerV3-style world model extended to replace external reward with expected free energy derived entirely from self-prior mismatch. The load-bearing finding is that sticker-removal probability rises from roughly 20% early in training to approximately 70% by 500k steps, while expected free energy after removal (67.33 ± 8.94) is significantly lower than before (79.33 ± 4.34; Wilcoxon p = 6.33 × 10⁻⁹), confirming the self-prior functions as an internal distinguishing criterion between self and non-self without any explicit reward or coordinate-transform module. Cross-modal sampling further establishes that visual appearance can be recovered from 12-dimensional proprioception alone, grounding the self-prior as a probabilistic body schema. An alternative approach that could have been used is direct coordinate-transform mapping from visual anomaly to motor command, as in Hoffmann et al.'s robot mirror work, but that approach requires an externally designed linkage rather than emergent motivation. The paper interprets these results as a computational implementation of Mitchell's inductive theory of mirror self-recognition—requiring kinesthetic-visual matching and implicit mirror correspondence learning—and positions the free energy principle as a parsimonious unifying hypothesis for developmental self-awareness research, mapping the model to Level 3 (identification) of Rochat's five-level framework. The most contestable point is the conflation of mark-directed behavior with mirror self-recognition: the failure analysis itself reveals that the agent sometimes reduces expected free energy by occluding the sticker visually rather than physically removing it, which means the mismatch criterion is sensitive to perceptual change rather than necessarily to bodily integrity, calling into question whether the mechanism truly distinguishes self from non-self or merely tracks familiar low-entropy visual states. Additionally, the study is confined to simulation; prior computational models of mirror self-recognition by Lanillos et al. and Hoffmann et al. have been validated on real robots, and whether the self-prior's latent-space density estimates would survive the noise and distribution shift of physical hardware remains untested.

Methods (2)

Cross-Modal Sampling
Technique used to demonstrate that the self-prior captures visual–proprioceptive associations by recovering visual appearance from proprioception alone
Sticker Removal Success Criterion
Operational definition: hand stays within 2 cm of sticker for 50 consecutive steps (0.5 seconds)

Frameworks (3)

Mirror Self-Recognition Test
The behavioral paradigm (mark/sticker placed on face, checked in mirror) used to evaluate self-awareness in animals and infants
Self-Prior
The key novel contribution: an internal model that learns the density of familiar multisensory experiences and drives mark-removal behavior through mismatch with the free energy principle
STORM
The transformer-based world model framework upon which the present architecture is built

Findings (10)

Passing a sticker-bearing latent through the self-prior removes the sticker in reconstruction, confirming distribution favors sticker-free state
Shows the self-prior's generative distribution rejects sticker-bearing states
In a single illustrative episode (seed 2), mean EFE after sticker removal was 12.00 lower than before removal after 500k training steps
Qualitative confirmation of EFE drop in trained model vs. untrained model (Δ = +1.70)
Mean hand-sticker distance decreased gradually across 500k training steps, including before removal probability exceeded 50%
Suggests the agent learned to recognize and approach the sticker before achieving reliable removal
Visual self-appearance can be recovered from proprioception alone via cross-modal sampling through the self-prior
Demonstrates the self-prior captures visual-proprioceptive associations, functioning as a probabilistic body schema
EFE decrease after sticker removal is statistically significant (Wilcoxon p = 6.33×10⁻⁹) across 80 evaluations
Confirms that EFE systematically decreases after sticker removal, validating the self-prior as internal criterion
Untrained model (0 training steps) shows no clear EFE difference before and after sticker removal (Δ = +1.70)
Control showing that the EFE signal is learned, not inherent to the architecture
Agent achieves approximately 70% sticker-removal success rate by end of 500k training steps
Main behavioral result demonstrating the model's efficacy in the mirror-mark task
Samples drawn from the trained self-prior correspond to sticker-free self in diverse poses
Demonstrates the self-prior learned the sticker-free body distribution as intended
Sticker-removal success rate stayed near 20% in the early phase of training
Shows learning progression from chance-level to functional behavior
Mean EFE before sticker removal across 80 evaluations: 79.33 ± 4.34
Baseline EFE when sticker is present, used for comparison

Claims (11)

Mark-directed behavior by itself does not constitute evidence of higher-order self-consciousness; the present study is a computational hypothesis about key behavior, not a complete account of mirror self-recognition
Epistemic humility claim limiting the scope of the paper's contribution
The self-prior is distinct from prior intrinsic motivation approaches because it models the density of the agent's own familiar multisensory experiences rather than improving exploration efficiency
Differentiates the self-prior from existing intrinsic motivation work
The sticker-removal behavior induced by the self-prior corresponds to stimulus-elicited intention rather than endogenous intention, aligning with the developmental view of early intentional agency
Connects the model's behavior to Zaadnoordijk and Bayne's taxonomy of intentional agency
The model constitutes a computational implementation of the inductive theory of mirror self-recognition, implementing kinesthetic-visual matching and implicit mirror correspondence learning
Claims the model satisfies the core requirements of Mitchell's inductive theory
Density evaluation of the self-prior in latent space approximates density evaluation in observation space because the latent state is a sufficient statistic of the observation
Theoretical justification for implementing the self-prior in latent rather than observation space
Failure episodes are primarily caused by hand occluding the sticker in the mirror or sticker leaving the visual field due to head rotation, plus kinematic reachability limits
Explains the ceiling on removal success as due to perceptual and kinematic constraints, not principled failures
The self-prior is functionally analogous to the body schema: it captures cross-modal associations and directly guides action planning toward multisensory mismatch reduction
Theoretical interpretation linking the self-prior to the established body schema concept
The self-prior operates as an internal criterion for distinguishing self from non-self, without external reward or explicitly computed sticker location
Central interpretive claim of the paper, supported by EFE decrease after sticker removal
A sensory anomaly (sticker mismatch) can itself become an intrinsic drive for action under active inference, even without external reward
Core mechanism claim linking mismatch detection to behavior through EFE minimization
The free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness
Broad theoretical claim connecting the model's success to the FEP as a unifying framework

Hypotheses (2)

Incorporating object permanence and objectification of body parts would enable exploration of developmental pathways toward Rochat's Level 4 and beyond
Future work hypothesis about extending the model to implement the deductive theory
Integrating the tactile modality into the self-prior model may improve learning efficiency for mirror self-recognition
Forward-looking prediction based on Chinn et al.'s finding that tactile experience promotes earlier MSR in infants

Questions (3)

How can mark-directed behavior emerge from the agent's own experience alone, without external reward or an explicit goal?
The central research question motivating the paper
Does mark-directed behavior in animals like cleaner fish constitute genuine mirror self-recognition?
Active debate referenced to contextualize the limits of behavioral evidence
To what level does this model actually implement mirror self-recognition?
Explicitly posed in the discussion to frame the theoretical contribution

Original abstract (expand)

The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

The Role of Valence and Meta-awareness in Mirror Self-recognition Using Hierarchical Active Inference
Jonathan Bauermeister and Pablo Lanillos
2022
≈ 86%
Active inference body perception and action for humanoid robots
Pablo Lanillos, Gordon Cheng Guillermo Oliver
2021
≈ 84%
Large Language Models Report Subjective Experience Under Self-Referential Processing
in corpus
2025
≈ 84%
Deep Active Inference
Kai Ueltzh\"offer
2018
≈ 83%
Self-Attention Limits Working Memory Capacity of Transformer-Based Models
Dongyu Gong and Hantao Zhang
2024
≈ 83%
Prior Preference Learning from Experts:Designing a Reward with Active Inference
Cheolhyeong Kim, Hyung Ju Hwang Jin young Shin
2021
≈ 83%
Modeling Rapid Contextual Learning in the Visual Cortex with Fast-Weight Deep Autoencoder Networks
Weifan Wang, Tai Sing Lee Yue Li
2025
≈ 83%
Predictions in the eye of the beholder: an active inference account of Watt governors
Christopher L. Buckley, Jelle Bruineberg Manuel Baltieri
2022
≈ 83%
Active inference on discrete state-spaces: a synthesis
in corpus
2020
≈ 83%
SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models
Daechul Ahn, Youhan Lee, Taewook Kang, Seongwon Cho, Jonghyun Choi Hyeonbeom Choi
2026
≈ 82%
Active Inference and Intentional Behaviour
Tommaso Salvatori, Takuya Isomura, Alexander Tschantz, Alex Kiefer, Tim Verbelen, Magnus Koudahl, Aswin Paul, Thomas Parr, Adeel Razi, Brett Kagan, Christopher L. Buckley, and Maxwell J. D. Ramstead Karl J. Friston
2023
≈ 82%
Gravity Prior and Temporal Horizon Shape Interceptive Behavior under Active Inference
Antonella Maselli, Federico Maggiore, Giovanni Pezzulo Marta Russo
2025
≈ 82%
An Active Inference Model of Covert and Overt Visual Attention
Karlo Koledi\'c, Fabio Bonsignorio, Ivan Petrovi\'c, and Ivan Markovi\'c Tin Mi\v{s}i\'c
2025
≈ 82%
Active Inference, Curiosity and Insight
in corpus
2017
≈ 82%
Anima Labs Phenomenology Pt1
in corpus
≈ 82%
Inference of Affordances and Active Motor Control in Simulated Agents
Christian Gumbsch, Sebastian Otte, Martin V. Butz Fedor Scholz
2022
≈ 82%
Reclaiming saliency: rhythmic precision-modulated action and perception
Filip Novicky, Thomas Parr, Karl Friston, Pablo Lanillos and Noor Sajid Ajith Anil Meera
2022
≈ 82%
A Neural Active Inference Model of Perceptual-Motor Learning
Gabriel J. Diaz, Brett R. Fajen, Reynold Bailey, Alexander Ororbia Zhizhuo Yang
2022
≈ 82%
Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation
Zekai Ye, Xiaocheng Feng, Weihong Zhong, Weitao Ma, Xiachong Feng Qiming Li
2025
≈ 82%
Active Inference: A Process Theory
in corpus
2017
≈ 81%
Active inference: demystified and compared
in corpus
2021
≈ 81%
Life as we know it
in corpus
2013
≈ 81%
A Mathematical Framework for Transformer Circuits
in corpus
2021
≈ 81%
A Free energy principle for the brain (lecture summary)
in corpus
2008
≈ 81%
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
in corpus
2026
≈ 81%
A tale of two densities: active inference is enactive inference
in corpus
2020
≈ 80%
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
in corpus
2026
≈ 80%
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
in corpus
2024
≈ 80%
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
in corpus
2026
≈ 80%
Active inference and learning
cited
2016
≈ 80%

+22 more