paper:doi-10-48550-arxiv-2604-09673Active Inference with a Self-Prior in the Mirror-Mark Task
TL;DR
Spontaneous mark-directed behavior in the mirror-mark task emerges from a single internal mechanism—the self-prior—combined with expected free energy minimization, without any external reward signal. A simulated infant model built on the EMFANT platform (MuJoCo-based, 12 DoF) achieves sticker removal in approximately 70% of evaluation episodes across 80 runs, using only 64×64×3 RGB vision and 12-dimensional proprioception, with tactile input deliberately withheld. The self-prior, implemented as a GPT-like Transformer that autoregressively models joint distributions over 32-class discrete latent variables, is trained exclusively on sticker-free experiences; when a sticker is present, the latent state falls outside the learned high-density region, and the policy—built on STORM's DreamerV3-like world model extended with active inference—selects actions that drive the state back toward familiar self-experience. Across all 80 evaluations, expected free energy after sticker removal (67.33 ± 8.94) was significantly lower than before removal (79.33 ± 4.34; Wilcoxon signed-rank p = 6.33 × 10⁻⁹), and cross-modal sampling confirmed that visual self-appearance can be recovered from proprioception alone, establishing the self-prior as a probabilistic body schema. The paper argues this constitutes a computational implementation of Mitchell's inductive theory of mirror self-recognition and that the free energy principle can serve as a unifying, parsimonious hypothesis for investigating the developmental origins of self-awareness.
What to take away
- 1. A simulated 12-DoF infant agent on the EMFANT/MuJoCo platform removes a facial sticker in approximately 70% of evaluation episodes by episode 500k, up from roughly 20% early in training, using only vision and proprioception with no tactile input.
- 2. Expected free energy after sticker removal (67.33 ± 8.94) is significantly lower than before removal (79.33 ± 4.34) across all 80 evaluations, confirmed by a Wilcoxon signed-rank test at p = 6.33 × 10⁻⁹.
- 3. The self-prior is implemented as a GPT-like autoregressive Transformer that models the joint distribution of a 32-class × 32-variable discrete latent state trained exclusively on sticker-free episodes, causing sticker-bearing observations to receive low probability and thus high expected free energy.
- 4. The architecture extends STORM (a DreamerV3-like Transformer-based world model) by replacing external reward with expected free energy computed from the self-prior, omitting STORM's reward predictor and continuation predictor entirely.
- 5. Training proceeds in three progressive stages: world model training starts after episode 100, self-prior training after episode 120, and policy training after episode 140, with policy rollouts imagined over H = 16 future steps.
- 6. Self-prior training uses only approximately 5% sticker-bearing episodes from collected data, deliberately constructing a prior that represents the sticker-free self so that any sticker induces a detectable density mismatch.
- 7. Cross-modal sampling demonstrates that the self-prior captures visual–proprioceptive associations: a visual self-appearance can be reconstructed from proprioceptive input alone, functioning as a probabilistic body schema consistent with Paillard/Gallagher body schema theories.
- 8. The paper raises an open question about whether occlusion of the sticker by the agent's own hand and kinematic reachability limits constitute the primary ceiling on performance, implying that the 70% success rate may not reflect a cognitive ceiling but a physical one.
- 9. To replicate the methodology, one should sample the replay buffer such that sticker-bearing episodes constitute only ~5% of self-prior training data while using the full dataset (50/50 sticker/no-sticker, 50/50 random/policy mix) for world model and policy training, with policy gradient clipping via ZClip and AdamW for world model and self-prior versus Adam for policy and value networks.
- 10. The paper predicts that integrating tactile modality—motivated by Chinn et al.'s finding that tactile target experience accelerates mirror self-recognition in infants—would improve learning efficiency and is identified as the primary direction for future work.
Peer brief — for seminar discussion
Kim, Kanazawa, and Kuniyoshi model the mirror-mark task in a 12-DoF simulated infant on the EMFANT platform (MuJoCo physics) by combining a transformer-based self-prior with active inference, training for 500k steps over 50,000 episodes and evaluating across 80 runs (8 seeds × 10 runs). The core method introduced is the self-prior: a GPT-like autoregressive Transformer that learns the density of sticker-free multisensory latent states, implemented within a STORM/DreamerV3-style world model extended to replace external reward with expected free energy derived entirely from self-prior mismatch. The load-bearing finding is that sticker-removal probability rises from roughly 20% early in training to approximately 70% by 500k steps, while expected free energy after removal (67.33 ± 8.94) is significantly lower than before (79.33 ± 4.34; Wilcoxon p = 6.33 × 10⁻⁹), confirming the self-prior functions as an internal distinguishing criterion between self and non-self without any explicit reward or coordinate-transform module. Cross-modal sampling further establishes that visual appearance can be recovered from 12-dimensional proprioception alone, grounding the self-prior as a probabilistic body schema. An alternative approach that could have been used is direct coordinate-transform mapping from visual anomaly to motor command, as in Hoffmann et al.'s robot mirror work, but that approach requires an externally designed linkage rather than emergent motivation. The paper interprets these results as a computational implementation of Mitchell's inductive theory of mirror self-recognition—requiring kinesthetic-visual matching and implicit mirror correspondence learning—and positions the free energy principle as a parsimonious unifying hypothesis for developmental self-awareness research, mapping the model to Level 3 (identification) of Rochat's five-level framework. The most contestable point is the conflation of mark-directed behavior with mirror self-recognition: the failure analysis itself reveals that the agent sometimes reduces expected free energy by occluding the sticker visually rather than physically removing it, which means the mismatch criterion is sensitive to perceptual change rather than necessarily to bodily integrity, calling into question whether the mechanism truly distinguishes self from non-self or merely tracks familiar low-entropy visual states. Additionally, the study is confined to simulation; prior computational models of mirror self-recognition by Lanillos et al. and Hoffmann et al. have been validated on real robots, and whether the self-prior's latent-space density estimates would survive the noise and distribution shift of physical hardware remains untested.
Methods (2)
- Cross-Modal SamplingTechnique used to demonstrate that the self-prior captures visual–proprioceptive associations by recovering visual appearance from proprioception alone
- Sticker Removal Success CriterionOperational definition: hand stays within 2 cm of sticker for 50 consecutive steps (0.5 seconds)
Frameworks (3)
- Mirror Self-Recognition TestThe behavioral paradigm (mark/sticker placed on face, checked in mirror) used to evaluate self-awareness in animals and infants
- Self-PriorThe key novel contribution: an internal model that learns the density of familiar multisensory experiences and drives mark-removal behavior through mismatch with the free energy principle
- STORMThe transformer-based world model framework upon which the present architecture is built
Findings (10)
- Passing a sticker-bearing latent through the self-prior removes the sticker in reconstruction, confirming distribution favors sticker-free state
Shows the self-prior's generative distribution rejects sticker-bearing states
- In a single illustrative episode (seed 2), mean EFE after sticker removal was 12.00 lower than before removal after 500k training steps
Qualitative confirmation of EFE drop in trained model vs. untrained model (Δ = +1.70)
- Mean hand-sticker distance decreased gradually across 500k training steps, including before removal probability exceeded 50%
Suggests the agent learned to recognize and approach the sticker before achieving reliable removal
- Visual self-appearance can be recovered from proprioception alone via cross-modal sampling through the self-prior
Demonstrates the self-prior captures visual-proprioceptive associations, functioning as a probabilistic body schema
- EFE decrease after sticker removal is statistically significant (Wilcoxon p = 6.33×10⁻⁹) across 80 evaluations
Confirms that EFE systematically decreases after sticker removal, validating the self-prior as internal criterion
- Untrained model (0 training steps) shows no clear EFE difference before and after sticker removal (Δ = +1.70)
Control showing that the EFE signal is learned, not inherent to the architecture
- Agent achieves approximately 70% sticker-removal success rate by end of 500k training steps
Main behavioral result demonstrating the model's efficacy in the mirror-mark task
- Samples drawn from the trained self-prior correspond to sticker-free self in diverse poses
Demonstrates the self-prior learned the sticker-free body distribution as intended
- Sticker-removal success rate stayed near 20% in the early phase of training
Shows learning progression from chance-level to functional behavior
- Mean EFE before sticker removal across 80 evaluations: 79.33 ± 4.34
Baseline EFE when sticker is present, used for comparison
Claims (11)
- Mark-directed behavior by itself does not constitute evidence of higher-order self-consciousness; the present study is a computational hypothesis about key behavior, not a complete account of mirror self-recognition
Epistemic humility claim limiting the scope of the paper's contribution
- The self-prior is distinct from prior intrinsic motivation approaches because it models the density of the agent's own familiar multisensory experiences rather than improving exploration efficiency
Differentiates the self-prior from existing intrinsic motivation work
- The sticker-removal behavior induced by the self-prior corresponds to stimulus-elicited intention rather than endogenous intention, aligning with the developmental view of early intentional agency
Connects the model's behavior to Zaadnoordijk and Bayne's taxonomy of intentional agency
- The model constitutes a computational implementation of the inductive theory of mirror self-recognition, implementing kinesthetic-visual matching and implicit mirror correspondence learning
Claims the model satisfies the core requirements of Mitchell's inductive theory
- Density evaluation of the self-prior in latent space approximates density evaluation in observation space because the latent state is a sufficient statistic of the observation
Theoretical justification for implementing the self-prior in latent rather than observation space
- Failure episodes are primarily caused by hand occluding the sticker in the mirror or sticker leaving the visual field due to head rotation, plus kinematic reachability limits
Explains the ceiling on removal success as due to perceptual and kinematic constraints, not principled failures
- The self-prior is functionally analogous to the body schema: it captures cross-modal associations and directly guides action planning toward multisensory mismatch reduction
Theoretical interpretation linking the self-prior to the established body schema concept
- The self-prior operates as an internal criterion for distinguishing self from non-self, without external reward or explicitly computed sticker location
Central interpretive claim of the paper, supported by EFE decrease after sticker removal
- A sensory anomaly (sticker mismatch) can itself become an intrinsic drive for action under active inference, even without external reward
Core mechanism claim linking mismatch detection to behavior through EFE minimization
- The free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness
Broad theoretical claim connecting the model's success to the FEP as a unifying framework
Hypotheses (2)
- Incorporating object permanence and objectification of body parts would enable exploration of developmental pathways toward Rochat's Level 4 and beyond
Future work hypothesis about extending the model to implement the deductive theory
- Integrating the tactile modality into the self-prior model may improve learning efficiency for mirror self-recognition
Forward-looking prediction based on Chinn et al.'s finding that tactile experience promotes earlier MSR in infants
Questions (3)
- How can mark-directed behavior emerge from the agent's own experience alone, without external reward or an explicit goal?
The central research question motivating the paper
- Does mark-directed behavior in animals like cleaner fish constitute genuine mirror self-recognition?
Active debate referenced to contextualize the limits of behavioral evidence
- To what level does this model actually implement mirror self-recognition?
Explicitly posed in the discussion to frame the theoretical contribution
Original abstract (expand)
The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- The Role of Valence and Meta-awareness in Mirror Self-recognition Using Hierarchical Active InferenceJonathan Bauermeister and Pablo Lanillos2022≈ 86%
- Active inference body perception and action for humanoid robotsPablo Lanillos, Gordon Cheng Guillermo Oliver2021≈ 84%
- ≈ 84%
- ≈ 83%
- Self-Attention Limits Working Memory Capacity of Transformer-Based ModelsDongyu Gong and Hantao Zhang2024≈ 83%
- Prior Preference Learning from Experts:Designing a Reward with Active InferenceCheolhyeong Kim, Hyung Ju Hwang Jin young Shin2021≈ 83%
- Modeling Rapid Contextual Learning in the Visual Cortex with Fast-Weight Deep Autoencoder NetworksWeifan Wang, Tai Sing Lee Yue Li2025≈ 83%
- Predictions in the eye of the beholder: an active inference account of Watt governorsChristopher L. Buckley, Jelle Bruineberg Manuel Baltieri2022≈ 83%
- ≈ 83%
- SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action ModelsDaechul Ahn, Youhan Lee, Taewook Kang, Seongwon Cho, Jonghyun Choi Hyeonbeom Choi2026≈ 82%
- Active Inference and Intentional BehaviourTommaso Salvatori, Takuya Isomura, Alexander Tschantz, Alex Kiefer, Tim Verbelen, Magnus Koudahl, Aswin Paul, Thomas Parr, Adeel Razi, Brett Kagan, Christopher L. Buckley, and Maxwell J. D. Ramstead Karl J. Friston2023≈ 82%
- Gravity Prior and Temporal Horizon Shape Interceptive Behavior under Active InferenceAntonella Maselli, Federico Maggiore, Giovanni Pezzulo Marta Russo2025≈ 82%
- An Active Inference Model of Covert and Overt Visual AttentionKarlo Koledi\'c, Fabio Bonsignorio, Ivan Petrovi\'c, and Ivan Markovi\'c Tin Mi\v{s}i\'c2025≈ 82%
- Active Inference, Curiosity and Insightin corpus2017≈ 82%
- Anima Labs Phenomenology Pt1in corpus≈ 82%
- Inference of Affordances and Active Motor Control in Simulated AgentsChristian Gumbsch, Sebastian Otte, Martin V. Butz Fedor Scholz2022≈ 82%
- Reclaiming saliency: rhythmic precision-modulated action and perceptionFilip Novicky, Thomas Parr, Karl Friston, Pablo Lanillos and Noor Sajid Ajith Anil Meera2022≈ 82%
- A Neural Active Inference Model of Perceptual-Motor LearningGabriel J. Diaz, Brett R. Fajen, Reynold Bailey, Alexander Ororbia Zhizhuo Yang2022≈ 82%
- Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination MitigationZekai Ye, Xiaocheng Feng, Weihong Zhong, Weitao Ma, Xiachong Feng Qiming Li2025≈ 82%
- Active Inference: A Process Theoryin corpus2017≈ 81%
- Active inference: demystified and comparedin corpus2021≈ 81%
- Life as we know itin corpus2013≈ 81%
- A Mathematical Framework for Transformer Circuitsin corpus2021≈ 81%
- ≈ 81%
- ≈ 81%
- ≈ 80%
- Quantitative Introspection in Language Models: Tracking Emotive States Across Conversationin corpus2026≈ 80%
- ≈ 80%
- ≈ 80%
- Active inference and learningcited2016≈ 80%
+22 more