hypothesis

active

hypothesis:we-hypothesize-that-potential-consciousness-phenomena-are-preferentially-associated-with-deeper-transformer-layers-and-the-2-3-layer-of-llms

We hypothesize that potential 'consciousness' phenomena are preferentially associated with deeper transformer layers and the 2/3 layer of LLMs.

Derived from observed alignment of promising cases with semantically rich deeper layers and the brain-aligned 2/3 layer.

Source paper

extracted_from

Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

(2025) · Li, Jingkai

Neighborhood — ranked by edge-count

Findings (3)

finding

Directing response attention to complement syntax and/or mental state verbs (MSV) yields no significant alterations in IIT estimates compared to entire stimulus analysis.
supports
Suggests LLMs do not represent complement/MSV linguistic features in the same way as they are crucial for human ToM development.
The case at approximately the 2/3 layer of LLaMA3.1-8B (Layer 24, satisfying Criteria 1 and 2) aligns with prior studies showing the 2/3 layer optimally predicts human brain activity.
supports
Connects this study's results to Schrimpf et al. 2021 and Caucheteux et al. 2022/2023 findings on brain-LLM alignment.
All cases satisfying Criteria 1 and 2 (two out of three) originate from deeper transformer layers and/or the 2/3 layer of LLMs.
supports
Consistent with literature that deeper layers encode semantic information and align with human brain activity.

Concepts (1)

concept

Can 'Consciousness' Be Observed from Large Language Model (LLM) Internal States? Dissecting LLM Representations Obtained from Theory of Mind Test with Integrated Information Theory and Span Representation Analysis
introduces
The primary paper being extracted — applies IIT 3.0 and 4.0 to LLM representation sequences derived from ToM test data to investigate whether consciousness phenomena can be observed.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We hypothesize that 'consciousness' phenomena can be observed in the internal states of an LLM, specifically in its learned representations when analyzed as a sequence.hypothesis0.846
Primary research hypothesis driving the entire study; operationalized via three criteria.
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.833
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
It is plausible that ongoing developments in LLMs may lead to models or agentic systems built on LLMs capable of generating representations observed with 'consciousness' phenomena.claim0.827
Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
Sequences of contemporary Transformer-based LLM representations lack statistically significant indicators of observed 'consciousness' phenomena under the three stringent criteria.claim0.826
Primary conclusion of the study based on temporal permutation analysis failing all three criteria.
No significant disparity in potential consciousness indicators was found between larger models (Mixtral-8x7B, LLaMA3.1-70B) and smaller counterparts (Mistral-7B, LLaMA3.1-8B).finding0.818
Contradicts expectation from emergent abilities literature; however, interpreted cautiously due to methodological limitations.
There are plausible views (e.g., global workspace theory, higher-order theory) on which autonomy entails phenomenal consciousness.claim0.813
Tentative conclusion on the autonomy-consciousness link.
Consciousness features occur substrate-neutrally across brains, unconventional embodiments, and information-processing systems.claim0.812
What features of each theory specifically pick out brains as a privileged substrate of inner perspective, or do the features emphasized by the theory occur elsewhere?question0.812
The central research question that drives the paper's analysis.