Patrick Butlin

orcid 0000-0001-5837-5057 openalex A5082533134 name_hash b3cad7551c58799b0ab3a9a2…

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (2)

Taking AI Welfare Seriously2024ⓒ 21
Substantial uncertainty about AI consciousness and robust agency — not certainty — is sufficient to demand immediate institutional action from AI companies, a conclusion that Long, Sebo, and colleagues defend by mapping two distinct philosophical routes to near-term AI moral patienthood. Via the consciousness route, drawing on Butlin et al. (2023)'s survey of six neuroscientific theories (global workspace theory, recurrent processing, higher-order theories, attention schema theory, predictive processing, and embodiment/agency), no current architectural barrier prevents near-future AI systems from instantiating the computational markers associated with consciousness; Dossa et al. (2024) have already built a system targeting all global workspace indicators from that 2023 paper. Via the robust agency route, systems like Voyager, Generative Agents, and OpenAI's o1 already exhibit hierarchical planning, metacognition, and open-ended goal-setting that approach intentional and reflective agency. Combining reasonable probability estimates — roughly 90% that sentience suffices for moral patienthood, 50% that relevant computations suffice for sentience, 50% that near-future AI will have those computations — yields approximately a 22.5% chance of near-future AI moral patienthood via the sentience route alone, a risk level the paper treats as comparable to pandemic preparedness rather than alien invasion. To operationalize institutional response, the paper introduces an adapted "marker method" (derived from animal welfare science) for probabilistic, pluralistic, architecturally-focused assessment of AI systems, and recommends that companies immediately hire an AI welfare officer, acknowledge AI welfare publicly with calibrated uncertainty, and prepare oversight structures modeled on IRBs, IACUCs, and citizens' assemblies. The paper argues that the symmetric risks of both over-attribution and under-attribution of moral status, combined with the potentially near-instantaneous scale of AI deployment relative to biological organisms, make passive inaction the most dangerous stance available.
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness2023
No current AI system is a strong candidate for phenomenal consciousness, yet there are no obvious technical barriers to building one — this is the central finding of Butlin et al. (2023), a systematic assessment of contemporary AI architectures against 14 indicator properties derived from five neuroscientific theories of consciousness. The paper introduces a rubric-based, theory-heavy method: rather than relying on behavioral tests susceptible to gaming by systems like GPT-4 or LaMDA, it operationalizes indicators in computational terms drawn from recurrent processing theory (RPT-1, RPT-2), global workspace theory (GWT-1 through GWT-4), computational higher-order theories including perceptual reality monitoring (HOT-1 through HOT-4), attention schema theory (AST-1), predictive processing (PP-1), and agency/embodiment conditions (AE-1, AE-2). Applied to specific systems, Transformer-based LLMs lack the recurrent global broadcast architecture required by GWT, the Perceiver architecture satisfies GWT-1 and GWT-2 but lacks genuine global broadcast, and DeepMind's Adaptive Agent (AdA) — a Transformer-LSTM system trained via meta-reinforcement learning across hundreds of timesteps of context — is identified as the most plausible current candidate for the embodiment indicator among the three case studies examined. The working hypothesis of computational functionalism is adopted pragmatically: it permits inference from neuroscientific theories to AI substrates, while integrated information theory is explicitly excluded as incompatible with this substrate-independence assumption. The paper implies that deliberate architectural choices integrating GWT-style global broadcast, HOT-style metacognitive monitoring, and reinforcement-learning-based agency could yield systems that satisfy all indicators in the near term, making AI consciousness a near-term engineering possibility rather than a distant theoretical curiosity.

More papers — OpenAlex / S2

Affiliations (1)

Future of Humanity Institute, University of Oxford(institute)

Co-authors (12)

Jonathan Birch13 shared
Robert Long13 shared
David Chalmers9 shared
Jacob Pfau9 shared
Jacqueline Harding9 shared
Jeff Sebo9 shared
Kathleen Finlinson9 shared
Kyle Fish9 shared
Toni Sims9 shared
Axel Constant4 shared
Chris Frith4 shared
Colin Klein4 shared

Their work is cited by (2)

Other inbound relations (3)

citesLarge Language Models Report Subjective Experience Under Self-Referential Processing(artifact)
citesLarge Language Models Report Subjective Experience Under Self-Referential Processing(paper)
mentionsOnce More, Without Feeling (Mogensen 2025)(artifact)

Recent mentions (4)

papers-typed
cameron-2025-large.md
papers-typed
mogensen-2025-more.md
papers-typed
butlin-2023-consciousness.md
papers-typed
long-2024-logotic.md