concept
active
concept:criterion-2-statistically-significant-value-differences-p-0-05-across-tom-score-categories-via-wilcoxon-testCriterion 2: Statistically significant Φ value differences (p<0.05) across ToM score categories via Wilcoxon test.
Second of three operational criteria; requires distributional significance in IIT estimates across performance levels.
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Criterion 1: Φ estimates must yield >80% 'good' cases (higher score → higher Φ) per ToM task to indicate potential consciousness.associated_withFirst of three operational criteria for identifying consciousness phenomena in LLM representations.
- Criterion 3: IIT estimates must achieve higher mean AUC than Span Representation for ToM score classification.associated_withThird of three operational criteria; distinguishes consciousness from inherent LLM representational separations.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Do distinctions in Φ estimates remain robust across diverse ToM stimuli in repeated large-scale trials?question0.782Criterion 2 operationalization: requires p<0.05 in Wilcoxon tests across score categories.
- Criterion 1 operationalization: requires >80% 'good' cases (higher score → higher Φ) per ToM task.
- Non-parametric statistical test used to assess significance of Φ differences between ToM score categories.
- Specific prediction linking IIT's prediction of high Φ for good performance to the experimental design's scoring structure.
- Main interpretive finding from Criterion 3 comparison showing Span Representation consistently outperforms IIT under temporal permutation.
- Even the rare cases where good > bad do not reach the 80% significance threshold required by Criterion 1.
- Shows that Burger et al.'s layer choice corresponds to a transitional phase, not a universal property.
- Only Criterion 2 is satisfied for this single case at the task level (granularity without aggregation).