concept
active
concept:criterion-3-iit-estimates-must-achieve-higher-mean-auc-than-span-representation-for-tom-score-classificationCriterion 3: IIT estimates must achieve higher mean AUC than Span Representation for ToM score classification.
Third of three operational criteria; distinguishes consciousness from inherent LLM representational separations.
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Span Representation Analysisassociated_withFramework for characterizing span-level information of sequences of representations, independent of any consciousness estimate; used as a comparison baseline.
Concepts (1)
concept
- Criterion 2: Statistically significant Φ value differences (p<0.05) across ToM score categories via Wilcoxon test.associated_withSecond of three operational criteria; requires distributional significance in IIT estimates across performance levels.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Criterion 3 operationalization: requires IIT mean AUC to exceed Span Representation mean AUC.
- Contrasts with temporal permutation where Span Representation dominates; suggests spatio permutation reveals different dynamics.
- Main interpretive finding from Criterion 3 comparison showing Span Representation consistently outperforms IIT under temporal permutation.
- Criterion 1 operationalization: requires >80% 'good' cases (higher score → higher Φ) per ToM task.
- First of three operational criteria for identifying consciousness phenomena in LLM representations.
- Even the rare cases where good > bad do not reach the 80% significance threshold required by Criterion 1.
- Suggests LLMs do not represent complement/MSV linguistic features in the same way as they are crucial for human ToM development.
- Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.