finding
active
finding:untrained-model-0-training-steps-shows-no-clear-efe-difference-before-and-after-sticker-removal-1-70Untrained model (0 training steps) shows no clear EFE difference before and after sticker removal (Δ = +1.70)
Control showing that the EFE signal is learned, not inherent to the architecture
Source paper
extracted_from(2026) · Dongmin Kim · Hoshinori Kanazawa · Yasuo Kuniyoshi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Central interpretive claim of the paper, supported by EFE decrease after sticker removal
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Qualitative confirmation of EFE drop in trained model vs. untrained model (Δ = +1.70)
- Baseline EFE when sticker is present, used for comparison
- Confirms that EFE systematically decreases after sticker removal, validating the self-prior as internal criterion
- Shows learning progression from chance-level to functional behavior
- Suggests the agent learned to recognize and approach the sticker before achieving reliable removal
- Agent achieves approximately 70% sticker-removal success rate by end of 500k training stepsfinding0.780Main behavioral result demonstrating the model's efficacy in the mirror-mark task
- Demonstrates persistence of compliance gap even when training non-compliance reaches zero
- Ethical implication about the nature of AI training experience if the thesis holds