claim

active

claim:failure-episodes-are-primarily-caused-by-hand-occluding-the-sticker-in-the-mirror-or-sticker-leaving-the-visual-field-due-to-head-rotation-plus-kinematic-reachability-limits

Failure episodes are primarily caused by hand occluding the sticker in the mirror or sticker leaving the visual field due to head rotation, plus kinematic reachability limits

Explains the ceiling on removal success as due to perceptual and kinematic constraints, not principled failures

Source paper

extracted_from

Active Inference with a Self-Prior in the Mirror-Mark Task

(2026) · Dongmin Kim · Hoshinori Kanazawa · Yasuo Kuniyoshi

Neighborhood — ranked by edge-count

Findings (1)

finding

Agent achieves approximately 70% sticker-removal success rate by end of 500k training steps
cites
Main behavioral result demonstrating the model's efficacy in the mirror-mark task

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

A sensory anomaly (sticker mismatch) can itself become an intrinsic drive for action under active inference, even without external rewardclaim0.751
Core mechanism claim linking mismatch detection to behavior through EFE minimization
Some failures may reflect prompt design rather than model limitations, but the underlying issue is one of reasoning rather than instruction-following.claim0.737
Acknowledges the confound of not explicitly instructing models to track wealth, yet points to reasoning gaps given code agents avoid errors without prompts.
Overbidding, self-bidding spirals, and undisciplined bluffing characterise failure.claim0.732
Concrete failure signatures extracted from traces.
Clamping code error feature to high activation causes the model to hallucinate error messages on bug-free code.finding0.732
Causal effect: feature induces perception of bugs.
The sticker-removal behavior induced by the self-prior corresponds to stimulus-elicited intention rather than endogenous intention, aligning with the developmental view of early intentional agencyclaim0.731
Connects the model's behavior to Zaadnoordijk and Bayne's taxonomy of intentional agency
Lesioning active, sensory, or internal states causes rapid structural disintegration and loss of spatial organization.finding0.730
Demonstrates autopoietic maintenance: Markov blanket integrity is necessary for preserving internal state configuration.
Behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation that never appear in code agents.claim0.719
LLMs exhibit systematic errors that deterministic logic avoids.
Overbid frequency, self-bidding rate, bankrupt-initiation patterns, and context-dependent offer calibration are failure modes invisible to both static evaluations and aggregate rankings like Eloclaim0.716
key claim about the benchmark's unique diagnostic value