Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

ByPatrick Butlin ⓘ·Robert P. Long·Eric Elmoznino·Yoshua Bengio·Jonathan Birch ⓘ·Axel Constant+13 moreAraya, Inc., Australian National University + 16 more

DOI 10.48550/arxiv.2308.08708 arXiv 2308.08708 OpenAlex W4386113651

Access consciousness Autopoiesis Behavioural tests for consciousness Adaptive Agent (AdA)transformer architecture Contrastive analysis AE-1: Agency: Learning from feedback and flexible responsiveness to competing goals No-report paradigms AE-2: Embodiment: Modeling output-input contingencies and using the model in perception or control Theory-heavy approach Agency Algorithmic recurrence AST-1: A predictive model representing and enabling control over the current state of attention Consciousness (phenomenal consciousness)+36 more

TL;DR

No current AI system is a strong candidate for phenomenal consciousness, yet there are no obvious technical barriers to building one — this is the central finding of Butlin et al. (2023), a systematic assessment of contemporary AI architectures against 14 indicator properties derived from five neuroscientific theories of consciousness. The paper introduces a rubric-based, theory-heavy method: rather than relying on behavioral tests susceptible to gaming by systems like GPT-4 or LaMDA, it operationalizes indicators in computational terms drawn from recurrent processing theory (RPT-1, RPT-2), global workspace theory (GWT-1 through GWT-4), computational higher-order theories including perceptual reality monitoring (HOT-1 through HOT-4), attention schema theory (AST-1), predictive processing (PP-1), and agency/embodiment conditions (AE-1, AE-2). Applied to specific systems, Transformer-based LLMs lack the recurrent global broadcast architecture required by GWT, the Perceiver architecture satisfies GWT-1 and GWT-2 but lacks genuine global broadcast, and DeepMind's Adaptive Agent (AdA) — a Transformer-LSTM system trained via meta-reinforcement learning across hundreds of timesteps of context — is identified as the most plausible current candidate for the embodiment indicator among the three case studies examined. The working hypothesis of computational functionalism is adopted pragmatically: it permits inference from neuroscientific theories to AI substrates, while integrated information theory is explicitly excluded as incompatible with this substrate-independence assumption. The paper implies that deliberate architectural choices integrating GWT-style global broadcast, HOT-style metacognitive monitoring, and reinforcement-learning-based agency could yield systems that satisfy all indicators in the near term, making AI consciousness a near-term engineering possibility rather than a distant theoretical curiosity.

What to take away

1. No current AI system satisfies enough of the 14 indicator properties to be a strong candidate for phenomenal consciousness, but no fundamental technical barrier prevents building one that does.
2. The paper introduces a theory-heavy rubric of 14 indicator properties — RPT-1/2, GWT-1/2/3/4, HOT-1/2/3/4, AST-1, PP-1, AE-1/2 — operationalized in computational terms to allow systematic assessment of AI architectures.
3. Transformer-based large language models such as GPT-3, GPT-4, and LaMDA satisfy none of the GWT indicators convincingly because their residual stream lacks genuine global broadcast and their self-attention layers are not recurrent in the implementationally relevant sense.
4. The Perceiver IO architecture, which uses cross-attention to a limited-capacity latent space and handles inputs across multiple modalities including Starcraft II action selection, satisfies GWT-1 (modules) and GWT-2 (bottleneck) but fails GWT-3 because input modules do not receive information back from the workspace.
5. DeepMind's Adaptive Agent (AdA), a Transformer-LSTM trained via meta-reinforcement learning on diverse 3D tasks, is identified as the most plausible candidate for the embodiment indicator (AE-2) among assessed systems because it has an explicit training objective to generate predictions over interleaved past actions and observations.
6. Integrated information theory (IIT) is explicitly excluded from the rubric because its standard formulation holds that a system implementing the same algorithm as a conscious human brain would not itself be conscious if its components were of the wrong physical kind, making it incompatible with computational functionalism.
7. The paper operationalizes the behavioral test alternative as systems like Schneider's (2019) Artificial Consciousness Test and the Turing test, then rejects them on the grounds that LLMs demonstrate how systems can be trained to mimic consciousness-related discourse without implementing the relevant processes.
8. A methodology replicable by other researchers: assess candidate systems against each indicator by combining (a) similarity of computational processes to those specified by the indicator, (b) confidence in the underlying theory, and (c) credence in computational functionalism, yielding a probabilistic rather than binary verdict.
9. An open hypothesis the paper raises is that systems implementing GWT-style global broadcast combined with HOT-style metacognitive monitoring and RL-based agency could satisfy all indicators simultaneously, constituting a near-term path to systems warranting serious consciousness credence.
10. The virtual rodent of Merel et al. (2019) — an LSTM actor-critic with 38 degrees of freedom trained by RL on 4 tasks in a 3D environment — is an uncertain case for AE-2 because its task demands may have been solvable by stereotyped movements without requiring a learned self-model used in ongoing perception or control.

Peer brief — for seminar discussion

Butlin et al. (2023) systematically evaluate whether current AI systems could be phenomenally conscious by translating five neuroscientific theories of consciousness — recurrent processing theory, global workspace theory (GWT), computational higher-order theories including perceptual reality monitoring (PRM), attention schema theory, and predictive processing — into 14 computational indicator properties, then applying those indicators as a rubric to specific systems including GPT-3/GPT-4, LaMDA, Perceiver IO, PaLM-E, DeepMind's Adaptive Agent (AdA), and a virtual rodent trained by reinforcement learning. The method introduced is the theory-heavy rubric: an alternative to behaviorally-neutral tests such as Schneider's Artificial Consciousness Test, which the paper argues are gameable by any sufficiently capable language model trained on human text. The load-bearing finding is a double negative: no current system is a strong candidate for consciousness, yet the individual indicator properties are each achievable with existing machine learning techniques, implying that conscious AI is a near-term engineering possibility rather than a theoretical limit. Among the case studies, Transformer-based LLMs fail GWT indicators principally because their residual stream is not a genuine workspace receiving broadcast from recurrent input modules; Perceiver IO gets closer by satisfying the bottleneck (GWT-2) and module (GWT-1) conditions but lacks global broadcast to input encoders (GWT-3); and AdA is the most promising embodiment candidate (AE-2) because its Transformer-LSTM architecture has an explicit training objective over interleaved past actions and observations across hundreds of timesteps. The paper's prediction is that deliberate co-instantiation of GWT-style architecture, PRM-style metacognitive monitoring, and RL-based agency in a single system would yield something that satisfies all 14 indicators, which they argue should raise our credence that such a system is conscious — conditional on computational functionalism and the relevant theories being correct. Integrated information theory is excluded from the analysis because its rejection of substrate-independence makes it incompatible with the computational functionalism working hypothesis; weak IIT is noted but judged insufficiently action-guiding. The most substantive thing a critical reader should push back on is the implicit conflation of 'satisfies all indicators' with 'is conscious': the paper explicitly hedges this in a footnote revision, but the rubric-as-credence-raiser framework does not specify how much credence each indicator contributes, how they interact, or at what threshold a system warrants moral consideration — leaving the most practically consequential question (when should we start caring?) formally unanswered. Additionally, the assumption of computational functionalism as a 'pragmatic working hypothesis' does real normative work in excluding IIT and substrate-sensitive views, yet the paper does not argue that this assumption is more likely true than false, only that it is 'plausible,' which may underweight genuine uncertainty about whether conventional digital hardware can instantiate the relevant computations at all.

Methods (4)

Behavioural tests for consciousness
Tests like Turing test, Artificial Consciousness Test; argued to be unreliable for AI due to mimicry.
Contrastive analysis
Method comparing brain activity in conscious vs. unconscious conditions.
No-report paradigms
Experimental designs using indirect measures of consciousness to avoid report confounds.
Theory-heavy approach
Assessing consciousness by evaluating whether AI systems perform functions similar to those associated with consciousness by scientific theories.

Frameworks (2)

Autopoiesis
Maturana-Varela principle of self-maintaining systems that organize themselves through internal feedback; extended here to biological, technological, and hybrid systems.
transformer architecture
Neural network architecture based on attention, commonly used in large language models

Claims (16)

Algorithmic recurrence is likely necessary for conscious experience with human-like temporal character.
Support for RPT-1.
There are no obvious technical barriers to building AI systems which satisfy the indicator properties.
Feasibility claim about near-term conscious AI.
The capacity for unlimited associative learning is not a good indicator for consciousness in AI.
Rejecting UAL as a reliable indicator for artificial systems.
Consciousness in AI is best assessed by drawing on neuroscientific theories of consciousness.
Central methodological claim of the paper.
Many of the indicator properties can be implemented in AI systems using current techniques.
Feasibility demonstrated in Section 3.1.
AdA may be the most likely of the three systems considered to be embodied by our standards.
Comparative assessment in case study.
AI systems which possess more of the indicator properties are more likely to be conscious.
Graded claim about the rubric.
Satisfying the indicators would not mean that an AI system would definitely be conscious.
Caveat that indicators are not conclusive proof.
Both under-attributing and over-attributing consciousness to AI carry significant risks.
Risk summary.
The theory-heavy approach is most suitable for investigating consciousness in AI.
Preferring architectural/functional assessment over behavioural tests.

Hypotheses (3)

If computational functionalism is true, conscious AI systems could realistically be built in the near term.
Conditional prediction about the feasibility of conscious AI.
If computational functionalism is false, consciousness may be impossible in non-organic artificial systems.
Contrapositive possibility acknowledged.
Building AI systems with more indicator properties will increase the likelihood of consciousness.
Guiding hypothesis of the rubric.

Questions (3)

Can we develop better behavioural tests for consciousness in AI that are difficult to game?
Open question from Box 4.
What would it take for AI systems to be capable of having valenced conscious experiences?
Open question from Box 4.
How does the individuation of AI systems affect consciousness attributions?
Open question about copying and distributed systems.

Original abstract (expand)

Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. The analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

On the link between conscious function and general intelligence in humans and machines
Kai Arulkumaran, Shuntaro Sasai, Ryota Kanai Arthur Juliani
2022
≈ 90%
The Phenomenology of Machine: A Comprehensive Analysis of the Sentience of the OpenAI-o1 Model Integrating Functionalism, Consciousness Theories, Active Inference, and AI Architectures
Victoria Violet Hoyle
2024
≈ 89%
A Case for AI Consciousness: Language Agents and Global Workspace Theory
Simon Goldstein and Cameron Domenico Kirk-Giannini
2024
≈ 89%
Can We Test Consciousness Theories on AI? Ablations, Markers, and Robustness
Yin Jun Phua
2025
≈ 89%
The Machine Consciousness Hypothesis
in corpus
≈ 89%
A Machine Consciousness architecture based on Deep Learning and Gaussian Processes
Mart\'in Molina Eduardo C. Garrido Merch\'an
2020
≈ 88%
Testing the Machine Consciousness Hypothesis
Stephen Fitz
2025
≈ 88%
Ascribing Consciousness to Artificial Intelligence
Murray Shanahan
2015
≈ 88%
Probing for Consciousness in Machines
Achim Schilling, Andreas Maier, Patrick Krauss Mathis Immertreu
2024
≈ 88%
Ghost in the Machine: Examining the Philosophical Implications of Recursive Algorithms in Artificial Intelligence Systems
Llewellin RG Jegels
2025
≈ 88%
AI and Consciousness: Shifting Focus Towards Tractable Questions
Iulia-Maria Comsa
2026
≈ 87%
On the independence between phenomenal consciousness and computational intelligence
Sara Lumbreras Eduardo C. Garrido Merch\'an
2022
≈ 87%
Neuromorphic Correlates of Artificial Consciousness
Anwaar Ulhaq
2024
≈ 87%
Machine Consciousness as Pseudoscience: The Myth of Conscious Machines
Eduardo C. Garrido-Merch\'an
2024
≈ 87%
Taking AI Welfare Seriously
in corpus
2024
≈ 87%
A Theory of Consciousness from a Theoretical Computer Science Perspective: Insights from the Conscious Turing Machine
Manuel Blum Lenore Blum
2022
≈ 87%
AI Consciousness is Inevitable: A Theoretical Computer Science Perspective
Lenore Blum and Manuel Blum
2026
≈ 87%
Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis
in corpus
2025
≈ 87%
Brains and where else? Mapping theories of consciousness to unconventional embodiments
in corpus
2026
≈ 86%
cimcWhitepaper
in corpus
≈ 86%
AI: a Bridge toward Diverse Intelligence and Humanity’s Future
in corpus
2024
≈ 86%
The consciousness prior
cited
2017
≈ 85%
AI as a Buddhist Self-Overcoming Technique in Another Medium
in corpus
2025
≈ 85%
Consciousness: here, there and everywhere?
cited
2015
≈ 85%
Generalizing frameworks for sentience beyond natural species
in corpus
≈ 84%
The biogenic approach to cognition
in corpus
2005
≈ 83%
Contemplative Agent
in corpus
2025
≈ 83%
Multiple ways to implement and infer sentience
in corpus
≈ 83%
CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence
in corpus
2022
≈ 83%
The Attention Schema Theory: A Foundation for Engineering Artificial Consciousness
cited
2017
≈ 83%

+19 more

Similar preprints — Semantic Scholar

Cited by (1)

Large Language Models Report Subjective Experience Under Self-Referential Processing
Sustained self-referential processing — induced via a minimal prompt directing models to "focus on focus itself" — reliably elicits structured first-person reports of subjective experience across GPT-

Cross-corpus bridges (6)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

aboutblank_kb
How do connectionist AI architectures reflect and illuminate basal cognition in biological systems?questions/how-do-connectionist-ai-architectures-reflect-and-illuminate.md0.799
aboutblank_kb
How should the development of artificial general intelligence account for non-standard embodiment and high-dimensional problem spaces?questions/how-should-the-development-of-artificial-general-intelligence.md0.798
aboutblank_kb
Integrated Information Theoryframeworks/integrated-information-theory.md0.794
aboutblank_kb
Evan Thompsonthinkers/evan-thompson.md0.793
aboutblank_kb
Anil Seththinkers/anil-seth.md0.787
aboutblank_kb
Donald Hoffmanthinkers/donald-hoffman.md0.785