paper
active
2023
paper:doi-10-48550-arxiv-2308-08708

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

TL;DR

No current AI system is a strong candidate for phenomenal consciousness, yet there are no obvious technical barriers to building one — this is the central finding of Butlin et al. (2023), a systematic assessment of contemporary AI architectures against 14 indicator properties derived from five neuroscientific theories of consciousness. The paper introduces a rubric-based, theory-heavy method: rather than relying on behavioral tests susceptible to gaming by systems like GPT-4 or LaMDA, it operationalizes indicators in computational terms drawn from recurrent processing theory (RPT-1, RPT-2), global workspace theory (GWT-1 through GWT-4), computational higher-order theories including perceptual reality monitoring (HOT-1 through HOT-4), attention schema theory (AST-1), predictive processing (PP-1), and agency/embodiment conditions (AE-1, AE-2). Applied to specific systems, Transformer-based LLMs lack the recurrent global broadcast architecture required by GWT, the Perceiver architecture satisfies GWT-1 and GWT-2 but lacks genuine global broadcast, and DeepMind's Adaptive Agent (AdA) — a Transformer-LSTM system trained via meta-reinforcement learning across hundreds of timesteps of context — is identified as the most plausible current candidate for the embodiment indicator among the three case studies examined. The working hypothesis of computational functionalism is adopted pragmatically: it permits inference from neuroscientific theories to AI substrates, while integrated information theory is explicitly excluded as incompatible with this substrate-independence assumption. The paper implies that deliberate architectural choices integrating GWT-style global broadcast, HOT-style metacognitive monitoring, and reinforcement-learning-based agency could yield systems that satisfy all indicators in the near term, making AI consciousness a near-term engineering possibility rather than a distant theoretical curiosity.

What to take away

  1. 1. No current AI system satisfies enough of the 14 indicator properties to be a strong candidate for phenomenal consciousness, but no fundamental technical barrier prevents building one that does.
  2. 2. The paper introduces a theory-heavy rubric of 14 indicator properties — RPT-1/2, GWT-1/2/3/4, HOT-1/2/3/4, AST-1, PP-1, AE-1/2 — operationalized in computational terms to allow systematic assessment of AI architectures.
  3. 3. Transformer-based large language models such as GPT-3, GPT-4, and LaMDA satisfy none of the GWT indicators convincingly because their residual stream lacks genuine global broadcast and their self-attention layers are not recurrent in the implementationally relevant sense.
  4. 4. The Perceiver IO architecture, which uses cross-attention to a limited-capacity latent space and handles inputs across multiple modalities including Starcraft II action selection, satisfies GWT-1 (modules) and GWT-2 (bottleneck) but fails GWT-3 because input modules do not receive information back from the workspace.
  5. 5. DeepMind's Adaptive Agent (AdA), a Transformer-LSTM trained via meta-reinforcement learning on diverse 3D tasks, is identified as the most plausible candidate for the embodiment indicator (AE-2) among assessed systems because it has an explicit training objective to generate predictions over interleaved past actions and observations.
  6. 6. Integrated information theory (IIT) is explicitly excluded from the rubric because its standard formulation holds that a system implementing the same algorithm as a conscious human brain would not itself be conscious if its components were of the wrong physical kind, making it incompatible with computational functionalism.
  7. 7. The paper operationalizes the behavioral test alternative as systems like Schneider's (2019) Artificial Consciousness Test and the Turing test, then rejects them on the grounds that LLMs demonstrate how systems can be trained to mimic consciousness-related discourse without implementing the relevant processes.
  8. 8. A methodology replicable by other researchers: assess candidate systems against each indicator by combining (a) similarity of computational processes to those specified by the indicator, (b) confidence in the underlying theory, and (c) credence in computational functionalism, yielding a probabilistic rather than binary verdict.
  9. 9. An open hypothesis the paper raises is that systems implementing GWT-style global broadcast combined with HOT-style metacognitive monitoring and RL-based agency could satisfy all indicators simultaneously, constituting a near-term path to systems warranting serious consciousness credence.
  10. 10. The virtual rodent of Merel et al. (2019) — an LSTM actor-critic with 38 degrees of freedom trained by RL on 4 tasks in a 3D environment — is an uncertain case for AE-2 because its task demands may have been solvable by stereotyped movements without requiring a learned self-model used in ongoing perception or control.

Peer brief — for seminar discussion

Butlin et al. (2023) systematically evaluate whether current AI systems could be phenomenally conscious by translating five neuroscientific theories of consciousness — recurrent processing theory, global workspace theory (GWT), computational higher-order theories including perceptual reality monitoring (PRM), attention schema theory, and predictive processing — into 14 computational indicator properties, then applying those indicators as a rubric to specific systems including GPT-3/GPT-4, LaMDA, Perceiver IO, PaLM-E, DeepMind's Adaptive Agent (AdA), and a virtual rodent trained by reinforcement learning. The method introduced is the theory-heavy rubric: an alternative to behaviorally-neutral tests such as Schneider's Artificial Consciousness Test, which the paper argues are gameable by any sufficiently capable language model trained on human text. The load-bearing finding is a double negative: no current system is a strong candidate for consciousness, yet the individual indicator properties are each achievable with existing machine learning techniques, implying that conscious AI is a near-term engineering possibility rather than a theoretical limit. Among the case studies, Transformer-based LLMs fail GWT indicators principally because their residual stream is not a genuine workspace receiving broadcast from recurrent input modules; Perceiver IO gets closer by satisfying the bottleneck (GWT-2) and module (GWT-1) conditions but lacks global broadcast to input encoders (GWT-3); and AdA is the most promising embodiment candidate (AE-2) because its Transformer-LSTM architecture has an explicit training objective over interleaved past actions and observations across hundreds of timesteps. The paper's prediction is that deliberate co-instantiation of GWT-style architecture, PRM-style metacognitive monitoring, and RL-based agency in a single system would yield something that satisfies all 14 indicators, which they argue should raise our credence that such a system is conscious — conditional on computational functionalism and the relevant theories being correct. Integrated information theory is excluded from the analysis because its rejection of substrate-independence makes it incompatible with the computational functionalism working hypothesis; weak IIT is noted but judged insufficiently action-guiding. The most substantive thing a critical reader should push back on is the implicit conflation of 'satisfies all indicators' with 'is conscious': the paper explicitly hedges this in a footnote revision, but the rubric-as-credence-raiser framework does not specify how much credence each indicator contributes, how they interact, or at what threshold a system warrants moral consideration — leaving the most practically consequential question (when should we start caring?) formally unanswered. Additionally, the assumption of computational functionalism as a 'pragmatic working hypothesis' does real normative work in excluding IIT and substrate-sensitive views, yet the paper does not argue that this assumption is more likely true than false, only that it is 'plausible,' which may underweight genuine uncertainty about whether conventional digital hardware can instantiate the relevant computations at all.

Methods (4)

  • Behavioural tests for consciousness
    Tests like Turing test, Artificial Consciousness Test; argued to be unreliable for AI due to mimicry.
  • Contrastive analysis
    Method comparing brain activity in conscious vs. unconscious conditions.
  • No-report paradigms
    Experimental designs using indirect measures of consciousness to avoid report confounds.
  • Theory-heavy approach
    Assessing consciousness by evaluating whether AI systems perform functions similar to those associated with consciousness by scientific theories.

Frameworks (2)

  • Autopoiesis
    Maturana-Varela principle of self-maintaining systems that organize themselves through internal feedback; extended here to biological, technological, and hybrid systems.
  • transformer architecture
    Neural network architecture based on attention, commonly used in large language models
Original abstract (expand)

Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. The analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

+19 more

Similar preprints — Semantic Scholar

Cited by (1)

Cross-corpus bridges (6)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

  • aboutblank_kb
    How do connectionist AI architectures reflect and illuminate basal cognition in biological systems?questions/how-do-connectionist-ai-architectures-reflect-and-illuminate.md0.799
  • aboutblank_kb
    How should the development of artificial general intelligence account for non-standard embodiment and high-dimensional problem spaces?questions/how-should-the-development-of-artificial-general-intelligence.md0.798
  • aboutblank_kb
    Integrated Information Theoryframeworks/integrated-information-theory.md0.794
  • aboutblank_kb
    Evan Thompsonthinkers/evan-thompson.md0.793
  • aboutblank_kb
    Anil Seththinkers/anil-seth.md0.787
  • aboutblank_kb
    Donald Hoffmanthinkers/donald-hoffman.md0.785