paper
active
2026
paper:doi-10-48550-arxiv-2604-09673

Active Inference with a Self-Prior in the Mirror-Mark Task

TL;DR

Spontaneous mark-directed behavior in the mirror-mark task emerges from a single internal mechanism—the self-prior—combined with expected free energy minimization, without any external reward signal. A simulated infant model built on the EMFANT platform (MuJoCo-based, 12 DoF) achieves sticker removal in approximately 70% of evaluation episodes across 80 runs, using only 64×64×3 RGB vision and 12-dimensional proprioception, with tactile input deliberately withheld. The self-prior, implemented as a GPT-like Transformer that autoregressively models joint distributions over 32-class discrete latent variables, is trained exclusively on sticker-free experiences; when a sticker is present, the latent state falls outside the learned high-density region, and the policy—built on STORM's DreamerV3-like world model extended with active inference—selects actions that drive the state back toward familiar self-experience. Across all 80 evaluations, expected free energy after sticker removal (67.33 ± 8.94) was significantly lower than before removal (79.33 ± 4.34; Wilcoxon signed-rank p = 6.33 × 10⁻⁹), and cross-modal sampling confirmed that visual self-appearance can be recovered from proprioception alone, establishing the self-prior as a probabilistic body schema. The paper argues this constitutes a computational implementation of Mitchell's inductive theory of mirror self-recognition and that the free energy principle can serve as a unifying, parsimonious hypothesis for investigating the developmental origins of self-awareness.

What to take away

  1. 1. A simulated 12-DoF infant agent on the EMFANT/MuJoCo platform removes a facial sticker in approximately 70% of evaluation episodes by episode 500k, up from roughly 20% early in training, using only vision and proprioception with no tactile input.
  2. 2. Expected free energy after sticker removal (67.33 ± 8.94) is significantly lower than before removal (79.33 ± 4.34) across all 80 evaluations, confirmed by a Wilcoxon signed-rank test at p = 6.33 × 10⁻⁹.
  3. 3. The self-prior is implemented as a GPT-like autoregressive Transformer that models the joint distribution of a 32-class × 32-variable discrete latent state trained exclusively on sticker-free episodes, causing sticker-bearing observations to receive low probability and thus high expected free energy.
  4. 4. The architecture extends STORM (a DreamerV3-like Transformer-based world model) by replacing external reward with expected free energy computed from the self-prior, omitting STORM's reward predictor and continuation predictor entirely.
  5. 5. Training proceeds in three progressive stages: world model training starts after episode 100, self-prior training after episode 120, and policy training after episode 140, with policy rollouts imagined over H = 16 future steps.
  6. 6. Self-prior training uses only approximately 5% sticker-bearing episodes from collected data, deliberately constructing a prior that represents the sticker-free self so that any sticker induces a detectable density mismatch.
  7. 7. Cross-modal sampling demonstrates that the self-prior captures visual–proprioceptive associations: a visual self-appearance can be reconstructed from proprioceptive input alone, functioning as a probabilistic body schema consistent with Paillard/Gallagher body schema theories.
  8. 8. The paper raises an open question about whether occlusion of the sticker by the agent's own hand and kinematic reachability limits constitute the primary ceiling on performance, implying that the 70% success rate may not reflect a cognitive ceiling but a physical one.
  9. 9. To replicate the methodology, one should sample the replay buffer such that sticker-bearing episodes constitute only ~5% of self-prior training data while using the full dataset (50/50 sticker/no-sticker, 50/50 random/policy mix) for world model and policy training, with policy gradient clipping via ZClip and AdamW for world model and self-prior versus Adam for policy and value networks.
  10. 10. The paper predicts that integrating tactile modality—motivated by Chinn et al.'s finding that tactile target experience accelerates mirror self-recognition in infants—would improve learning efficiency and is identified as the primary direction for future work.

Peer brief — for seminar discussion

Kim, Kanazawa, and Kuniyoshi model the mirror-mark task in a 12-DoF simulated infant on the EMFANT platform (MuJoCo physics) by combining a transformer-based self-prior with active inference, training for 500k steps over 50,000 episodes and evaluating across 80 runs (8 seeds × 10 runs). The core method introduced is the self-prior: a GPT-like autoregressive Transformer that learns the density of sticker-free multisensory latent states, implemented within a STORM/DreamerV3-style world model extended to replace external reward with expected free energy derived entirely from self-prior mismatch. The load-bearing finding is that sticker-removal probability rises from roughly 20% early in training to approximately 70% by 500k steps, while expected free energy after removal (67.33 ± 8.94) is significantly lower than before (79.33 ± 4.34; Wilcoxon p = 6.33 × 10⁻⁹), confirming the self-prior functions as an internal distinguishing criterion between self and non-self without any explicit reward or coordinate-transform module. Cross-modal sampling further establishes that visual appearance can be recovered from 12-dimensional proprioception alone, grounding the self-prior as a probabilistic body schema. An alternative approach that could have been used is direct coordinate-transform mapping from visual anomaly to motor command, as in Hoffmann et al.'s robot mirror work, but that approach requires an externally designed linkage rather than emergent motivation. The paper interprets these results as a computational implementation of Mitchell's inductive theory of mirror self-recognition—requiring kinesthetic-visual matching and implicit mirror correspondence learning—and positions the free energy principle as a parsimonious unifying hypothesis for developmental self-awareness research, mapping the model to Level 3 (identification) of Rochat's five-level framework. The most contestable point is the conflation of mark-directed behavior with mirror self-recognition: the failure analysis itself reveals that the agent sometimes reduces expected free energy by occluding the sticker visually rather than physically removing it, which means the mismatch criterion is sensitive to perceptual change rather than necessarily to bodily integrity, calling into question whether the mechanism truly distinguishes self from non-self or merely tracks familiar low-entropy visual states. Additionally, the study is confined to simulation; prior computational models of mirror self-recognition by Lanillos et al. and Hoffmann et al. have been validated on real robots, and whether the self-prior's latent-space density estimates would survive the noise and distribution shift of physical hardware remains untested.

Methods (2)

  • Cross-Modal Sampling
    Technique used to demonstrate that the self-prior captures visual–proprioceptive associations by recovering visual appearance from proprioception alone
  • Sticker Removal Success Criterion
    Operational definition: hand stays within 2 cm of sticker for 50 consecutive steps (0.5 seconds)

Frameworks (3)

  • Mirror Self-Recognition Test
    The behavioral paradigm (mark/sticker placed on face, checked in mirror) used to evaluate self-awareness in animals and infants
  • Self-Prior
    The key novel contribution: an internal model that learns the density of familiar multisensory experiences and drives mark-removal behavior through mismatch with the free energy principle
  • STORM
    The transformer-based world model framework upon which the present architecture is built

Findings (10)

Claims (11)

Hypotheses (2)

Questions (3)

Original abstract (expand)

The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

+22 more

Similar preprints — Semantic Scholar