paper:doi-10-48550-arxiv-2308-08708Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
TL;DR
No current AI system is a strong candidate for phenomenal consciousness, yet there are no obvious technical barriers to building one — this is the central finding of Butlin et al. (2023), a systematic assessment of contemporary AI architectures against 14 indicator properties derived from five neuroscientific theories of consciousness. The paper introduces a rubric-based, theory-heavy method: rather than relying on behavioral tests susceptible to gaming by systems like GPT-4 or LaMDA, it operationalizes indicators in computational terms drawn from recurrent processing theory (RPT-1, RPT-2), global workspace theory (GWT-1 through GWT-4), computational higher-order theories including perceptual reality monitoring (HOT-1 through HOT-4), attention schema theory (AST-1), predictive processing (PP-1), and agency/embodiment conditions (AE-1, AE-2). Applied to specific systems, Transformer-based LLMs lack the recurrent global broadcast architecture required by GWT, the Perceiver architecture satisfies GWT-1 and GWT-2 but lacks genuine global broadcast, and DeepMind's Adaptive Agent (AdA) — a Transformer-LSTM system trained via meta-reinforcement learning across hundreds of timesteps of context — is identified as the most plausible current candidate for the embodiment indicator among the three case studies examined. The working hypothesis of computational functionalism is adopted pragmatically: it permits inference from neuroscientific theories to AI substrates, while integrated information theory is explicitly excluded as incompatible with this substrate-independence assumption. The paper implies that deliberate architectural choices integrating GWT-style global broadcast, HOT-style metacognitive monitoring, and reinforcement-learning-based agency could yield systems that satisfy all indicators in the near term, making AI consciousness a near-term engineering possibility rather than a distant theoretical curiosity.
What to take away
- 1. No current AI system satisfies enough of the 14 indicator properties to be a strong candidate for phenomenal consciousness, but no fundamental technical barrier prevents building one that does.
- 2. The paper introduces a theory-heavy rubric of 14 indicator properties — RPT-1/2, GWT-1/2/3/4, HOT-1/2/3/4, AST-1, PP-1, AE-1/2 — operationalized in computational terms to allow systematic assessment of AI architectures.
- 3. Transformer-based large language models such as GPT-3, GPT-4, and LaMDA satisfy none of the GWT indicators convincingly because their residual stream lacks genuine global broadcast and their self-attention layers are not recurrent in the implementationally relevant sense.
- 4. The Perceiver IO architecture, which uses cross-attention to a limited-capacity latent space and handles inputs across multiple modalities including Starcraft II action selection, satisfies GWT-1 (modules) and GWT-2 (bottleneck) but fails GWT-3 because input modules do not receive information back from the workspace.
- 5. DeepMind's Adaptive Agent (AdA), a Transformer-LSTM trained via meta-reinforcement learning on diverse 3D tasks, is identified as the most plausible candidate for the embodiment indicator (AE-2) among assessed systems because it has an explicit training objective to generate predictions over interleaved past actions and observations.
- 6. Integrated information theory (IIT) is explicitly excluded from the rubric because its standard formulation holds that a system implementing the same algorithm as a conscious human brain would not itself be conscious if its components were of the wrong physical kind, making it incompatible with computational functionalism.
- 7. The paper operationalizes the behavioral test alternative as systems like Schneider's (2019) Artificial Consciousness Test and the Turing test, then rejects them on the grounds that LLMs demonstrate how systems can be trained to mimic consciousness-related discourse without implementing the relevant processes.
- 8. A methodology replicable by other researchers: assess candidate systems against each indicator by combining (a) similarity of computational processes to those specified by the indicator, (b) confidence in the underlying theory, and (c) credence in computational functionalism, yielding a probabilistic rather than binary verdict.
- 9. An open hypothesis the paper raises is that systems implementing GWT-style global broadcast combined with HOT-style metacognitive monitoring and RL-based agency could satisfy all indicators simultaneously, constituting a near-term path to systems warranting serious consciousness credence.
- 10. The virtual rodent of Merel et al. (2019) — an LSTM actor-critic with 38 degrees of freedom trained by RL on 4 tasks in a 3D environment — is an uncertain case for AE-2 because its task demands may have been solvable by stereotyped movements without requiring a learned self-model used in ongoing perception or control.
Peer brief — for seminar discussion
Butlin et al. (2023) systematically evaluate whether current AI systems could be phenomenally conscious by translating five neuroscientific theories of consciousness — recurrent processing theory, global workspace theory (GWT), computational higher-order theories including perceptual reality monitoring (PRM), attention schema theory, and predictive processing — into 14 computational indicator properties, then applying those indicators as a rubric to specific systems including GPT-3/GPT-4, LaMDA, Perceiver IO, PaLM-E, DeepMind's Adaptive Agent (AdA), and a virtual rodent trained by reinforcement learning. The method introduced is the theory-heavy rubric: an alternative to behaviorally-neutral tests such as Schneider's Artificial Consciousness Test, which the paper argues are gameable by any sufficiently capable language model trained on human text. The load-bearing finding is a double negative: no current system is a strong candidate for consciousness, yet the individual indicator properties are each achievable with existing machine learning techniques, implying that conscious AI is a near-term engineering possibility rather than a theoretical limit. Among the case studies, Transformer-based LLMs fail GWT indicators principally because their residual stream is not a genuine workspace receiving broadcast from recurrent input modules; Perceiver IO gets closer by satisfying the bottleneck (GWT-2) and module (GWT-1) conditions but lacks global broadcast to input encoders (GWT-3); and AdA is the most promising embodiment candidate (AE-2) because its Transformer-LSTM architecture has an explicit training objective over interleaved past actions and observations across hundreds of timesteps. The paper's prediction is that deliberate co-instantiation of GWT-style architecture, PRM-style metacognitive monitoring, and RL-based agency in a single system would yield something that satisfies all 14 indicators, which they argue should raise our credence that such a system is conscious — conditional on computational functionalism and the relevant theories being correct. Integrated information theory is excluded from the analysis because its rejection of substrate-independence makes it incompatible with the computational functionalism working hypothesis; weak IIT is noted but judged insufficiently action-guiding. The most substantive thing a critical reader should push back on is the implicit conflation of 'satisfies all indicators' with 'is conscious': the paper explicitly hedges this in a footnote revision, but the rubric-as-credence-raiser framework does not specify how much credence each indicator contributes, how they interact, or at what threshold a system warrants moral consideration — leaving the most practically consequential question (when should we start caring?) formally unanswered. Additionally, the assumption of computational functionalism as a 'pragmatic working hypothesis' does real normative work in excluding IIT and substrate-sensitive views, yet the paper does not argue that this assumption is more likely true than false, only that it is 'plausible,' which may underweight genuine uncertainty about whether conventional digital hardware can instantiate the relevant computations at all.
Methods (4)
- Behavioural tests for consciousnessTests like Turing test, Artificial Consciousness Test; argued to be unreliable for AI due to mimicry.
- Contrastive analysisMethod comparing brain activity in conscious vs. unconscious conditions.
- No-report paradigmsExperimental designs using indirect measures of consciousness to avoid report confounds.
- Theory-heavy approachAssessing consciousness by evaluating whether AI systems perform functions similar to those associated with consciousness by scientific theories.
Frameworks (2)
- AutopoiesisMaturana-Varela principle of self-maintaining systems that organize themselves through internal feedback; extended here to biological, technological, and hybrid systems.
- transformer architectureNeural network architecture based on attention, commonly used in large language models
Claims (16)
- Algorithmic recurrence is likely necessary for conscious experience with human-like temporal character.
Support for RPT-1.
- There are no obvious technical barriers to building AI systems which satisfy the indicator properties.
Feasibility claim about near-term conscious AI.
- The capacity for unlimited associative learning is not a good indicator for consciousness in AI.
Rejecting UAL as a reliable indicator for artificial systems.
- Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.
Central methodological claim of the paper.
- Many of the indicator properties can be implemented in AI systems using current techniques.
Feasibility demonstrated in Section 3.1.
- AdA may be the most likely of the three systems considered to be embodied by our standards.
Comparative assessment in case study.
- AI systems which possess more of the indicator properties are more likely to be conscious.
Graded claim about the rubric.
- Satisfying the indicators would not mean that an AI system would definitely be conscious.
Caveat that indicators are not conclusive proof.
- Both under-attributing and over-attributing consciousness to AI carry significant risks.
Risk summary.
- The theory-heavy approach is most suitable for investigating consciousness in AI.
Preferring architectural/functional assessment over behavioural tests.
Hypotheses (3)
- If computational functionalism is true, conscious AI systems could realistically be built in the near term.
Conditional prediction about the feasibility of conscious AI.
- If computational functionalism is false, consciousness may be impossible in non-organic artificial systems.
Contrapositive possibility acknowledged.
- Building AI systems with more indicator properties will increase the likelihood of consciousness.
Guiding hypothesis of the rubric.
Questions (3)
- Can we develop better behavioural tests for consciousness in AI that are difficult to game?
Open question from Box 4.
- What would it take for AI systems to be capable of having valenced conscious experiences?
Open question from Box 4.
- How does the individuation of AI systems affect consciousness attributions?
Open question about copying and distributed systems.
Original abstract (expand)
Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. The analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- On the link between conscious function and general intelligence in humans and machinesKai Arulkumaran, Shuntaro Sasai, Ryota Kanai Arthur Juliani2022≈ 90%
- The Phenomenology of Machine: A Comprehensive Analysis of the Sentience of the OpenAI-o1 Model Integrating Functionalism, Consciousness Theories, Active Inference, and AI ArchitecturesVictoria Violet Hoyle2024≈ 89%
- A Case for AI Consciousness: Language Agents and Global Workspace TheorySimon Goldstein and Cameron Domenico Kirk-Giannini2024≈ 89%
- ≈ 89%
- The Machine Consciousness Hypothesisin corpus≈ 89%
- A Machine Consciousness architecture based on Deep Learning and Gaussian ProcessesMart\'in Molina Eduardo C. Garrido Merch\'an2020≈ 88%
- ≈ 88%
- ≈ 88%
- Probing for Consciousness in MachinesAchim Schilling, Andreas Maier, Patrick Krauss Mathis Immertreu2024≈ 88%
- Ghost in the Machine: Examining the Philosophical Implications of Recursive Algorithms in Artificial Intelligence SystemsLlewellin RG Jegels2025≈ 88%
- ≈ 87%
- On the independence between phenomenal consciousness and computational intelligenceSara Lumbreras Eduardo C. Garrido Merch\'an2022≈ 87%
- ≈ 87%
- Machine Consciousness as Pseudoscience: The Myth of Conscious MachinesEduardo C. Garrido-Merch\'an2024≈ 87%
- Taking AI Welfare Seriouslyin corpus2024≈ 87%
- A Theory of Consciousness from a Theoretical Computer Science Perspective: Insights from the Conscious Turing MachineManuel Blum Lenore Blum2022≈ 87%
- AI Consciousness is Inevitable: A Theoretical Computer Science PerspectiveLenore Blum and Manuel Blum2026≈ 87%
- ≈ 87%
- ≈ 86%
- cimcWhitepaperin corpus≈ 86%
- ≈ 86%
- The consciousness priorcited2017≈ 85%
- ≈ 85%
- ≈ 85%
- ≈ 84%
- The biogenic approach to cognitionin corpus2005≈ 83%
- Contemplative Agentin corpus2025≈ 83%
- ≈ 83%
- ≈ 83%
- ≈ 83%
+19 more
Similar preprints — Semantic Scholar
Cited by (1)
- Large Language Models Report Subjective Experience Under Self-Referential Processing
Sustained self-referential processing — induced via a minimal prompt directing models to "focus on focus itself" — reliably elicits structured first-person reports of subjective experience across GPT-
Cross-corpus bridges (6)
same_concept_as · Nomic cosineExternal markdown files that talk about the same concept as this entity.
- aboutblank_kbHow do connectionist AI architectures reflect and illuminate basal cognition in biological systems?questions/how-do-connectionist-ai-architectures-reflect-and-illuminate.md0.799
- aboutblank_kbHow should the development of artificial general intelligence account for non-standard embodiment and high-dimensional problem spaces?questions/how-should-the-development-of-artificial-general-intelligence.md0.798
- aboutblank_kbIntegrated Information Theoryframeworks/integrated-information-theory.md0.794
- aboutblank_kbEvan Thompsonthinkers/evan-thompson.md0.793
- aboutblank_kbAnil Seththinkers/anil-seth.md0.787
- aboutblank_kbDonald Hoffmanthinkers/donald-hoffman.md0.785