question
active
question:can-estimates-of-the-primary-metric-of-iit-robustly-differentiate-responses-across-distinct-tom-performance-levelsCan estimates of Φ, the primary metric of IIT, robustly differentiate responses across distinct ToM performance levels?
Criterion 1 operationalization: requires >80% 'good' cases (higher score → higher Φ) per ToM task.
Source paper
extracted_from(2025) · Li, Jingkai
Neighborhood — ranked by edge-count
Claims (1)
claim
- Primary conclusion of the study based on temporal permutation analysis failing all three criteria.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Do distinctions in Φ estimates remain robust across diverse ToM stimuli in repeated large-scale trials?question0.834Criterion 2 operationalization: requires p<0.05 in Wilcoxon tests across score categories.
- Criterion 3 operationalization: requires IIT mean AUC to exceed Span Representation mean AUC.
- Third of three operational criteria; distinguishes consciousness from inherent LLM representational separations.
- Second of three operational criteria; requires distributional significance in IIT estimates across performance levels.
- First of three operational criteria for identifying consciousness phenomena in LLM representations.
- Methodological constraint adopted from IIT literature to justify the comparative experimental design.
- Main interpretive finding from Criterion 3 comparison showing Span Representation consistently outperforms IIT under temporal permutation.
- Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.