Taking AI Welfare Seriously

ByRobert Long ⓘ·Jeff Sebo ⓘ·Patrick Butlin ⓘ·Kathleen Finlinson·Kyle Fish·Jacqueline Harding+4 moreAnthropic, Center for AI Safety + 8 more

DOI 10.48550/arxiv.2411.00986 arXiv 2411.00986 OpenAlex W4404350868

AI Safety Agency and Embodiment Framework (for consciousness)AI Consciousness Test (ACT)Metacognition Attention Schema Theory Experience Replay Citizens' Assemblies Marker Method Computational Functionalism Monte Carlo Tree Search Desire-Satisfaction View of Welfare Self-Report Method for AI Introspection Ethical Behaviorism Global workspace theory+10 more

TL;DR

Substantial uncertainty about AI consciousness and robust agency — not certainty — is sufficient to demand immediate institutional action from AI companies, a conclusion that Long, Sebo, and colleagues defend by mapping two distinct philosophical routes to near-term AI moral patienthood. Via the consciousness route, drawing on Butlin et al. (2023)'s survey of six neuroscientific theories (global workspace theory, recurrent processing, higher-order theories, attention schema theory, predictive processing, and embodiment/agency), no current architectural barrier prevents near-future AI systems from instantiating the computational markers associated with consciousness; Dossa et al. (2024) have already built a system targeting all global workspace indicators from that 2023 paper. Via the robust agency route, systems like Voyager, Generative Agents, and OpenAI's o1 already exhibit hierarchical planning, metacognition, and open-ended goal-setting that approach intentional and reflective agency. Combining reasonable probability estimates — roughly 90% that sentience suffices for moral patienthood, 50% that relevant computations suffice for sentience, 50% that near-future AI will have those computations — yields approximately a 22.5% chance of near-future AI moral patienthood via the sentience route alone, a risk level the paper treats as comparable to pandemic preparedness rather than alien invasion. To operationalize institutional response, the paper introduces an adapted "marker method" (derived from animal welfare science) for probabilistic, pluralistic, architecturally-focused assessment of AI systems, and recommends that companies immediately hire an AI welfare officer, acknowledge AI welfare publicly with calibrated uncertainty, and prepare oversight structures modeled on IRBs, IACUCs, and citizens' assemblies. The paper argues that the symmetric risks of both over-attribution and under-attribution of moral status, combined with the potentially near-instantaneous scale of AI deployment relative to biological organisms, make passive inaction the most dangerous stance available.

What to take away

1. Combining defensible probability estimates — ~90% that sentience suffices for moral patienthood, ~50% that relevant computations suffice for sentience, ~50% that near-future AI will have those computations — yields approximately a 22.5% chance of near-future AI moral patienthood via the sentience route alone, which the paper treats as a non-negligible policy-triggering risk.
2. Butlin et al. (2023) surveyed six neuroscientific theories of consciousness (recurrent processing, global workspace theory, higher-order theories, attention schema theory, predictive processing, and agency/embodiment) and found no clear architectural barriers to near-future AI systems satisfying the associated computational markers.
3. David Chalmers judges it 'not unreasonable' to assign over 50% credence that sophisticated LLM+ systems with all proposed consciousness-relevant properties will exist within a decade, and over 50% credence that such systems would be conscious, implying a combined credence of 25% or more for near-term AI consciousness.
4. Dossa et al. (2024) have already built an AI system explicitly designed to implement all of the global workspace indicators enumerated in Butlin et al. (2023), demonstrating a direct development pathway from theoretical markers to implemented architecture.
5. The paper introduces an adapted 'marker method' — borrowed from animal welfare science, as used by Birch et al. (2021) to extend UK animal welfare law to cephalopods and decapods — as the central assessment instrument for probabilistic, pluralistic, architecturally-focused evaluation of AI moral patienthood.
6. As a replicable methodology, the paper recommends a four-level probabilistic framework: assess (1) which capacities suffice for moral patienthood, (2) which computational features suffice for each capacity, (3) which architectural markers are proxies for those features, and (4) which deployed systems possess those markers — with each level evaluated pluralistically across ethical and scientific theories.
7. An open question the paper raises is whether the asymmetry between over-attribution and under-attribution risks — which clearly favors precaution in the animal welfare case — holds for AI, or whether the risks are sufficiently symmetric to preclude a simple 'err on the side of caution' strategy.
8. Language agents including Voyager (Wang et al., 2023), Generative Agents (Park et al., 2023), and ReAct (Yao et al., 2023) already exhibit flexible goal-setting, episodic memory integration, and metacognition, which the paper treats as significant partial indicators of intentional and reflective agency.
9. In a survey of the Association for the Scientific Study of Consciousness, only 3% of members answered 'no' to whether machines could have consciousness, while over two-thirds answered 'yes' or 'probably yes', and a 2020 PhilPapers survey found ~39% of professional philosophers accept or lean toward future AI consciousness.
10. The paper recommends that AI companies immediately appoint a formally recognized AI welfare officer (directly responsible individual) with authority to access internal information, make recommendations, and coordinate with AI safety teams, as the minimal first institutional step toward responsible AI welfare governance.

Peer brief — for seminar discussion

Long, Sebo, and colleagues argue that the realistic possibility — not certainty — of AI consciousness and robust agency within roughly the next decade is sufficient to obligate AI companies to take immediate, low-cost precautionary steps. The argument proceeds via two parallel philosophical routes. The consciousness route holds that there is a non-negligible probability that (a) consciousness suffices for moral patienthood, (b) certain computational features suffice for consciousness, and (c) near-future AI systems will have those features; drawing on Butlin et al. (2023)'s survey of six neuroscientific theories evaluated through computational functionalism, no current architectural barrier prevents satisfaction of these markers. The robust agency route is structurally identical but tracks intentional, reflective, and rational agency rather than consciousness, pointing to systems like Voyager, Generative Agents, OpenAI's o1, and DeepMind's Adaptive Agent as evidence of incremental progress. Combining reasonable probability assignments — approximately 90% that sentience suffices for moral patienthood, 50% for the computational sufficiency claim, 50% for near-future AI capability — yields roughly a 22.5% probability of AI moral patienthood via the sentience route alone, a risk the paper explicitly compares to pandemic preparedness rather than science fiction. The load-bearing methodological contribution is an adaptation of the animal welfare 'marker method' — the same pluralistic, probabilistic instrument that Birch et al. (2021) applied to cephalopods and decapods, influencing UK law — for use with AI systems, with the key modification that behavioral evidence is deprioritized in favor of architectural and computational evidence given AI systems' capacity to game behavioral tests. Three institutional recommendations follow: acknowledge AI welfare with calibrated public uncertainty, assess AI systems using the adapted marker method within a four-level probabilistic framework, and prepare governance structures (modeled on IRBs, IACUCs, and citizens' assemblies, with their limitations acknowledged) starting with appointment of a formally recognized AI welfare officer. The paper also raises the hypothesis that symmetric over- and under-attribution risks, combined with AI's potential for near-instantaneous deployment scale, make passive inaction uniquely dangerous compared to the animal welfare case. An alternative assessment approach the paper could have pursued — and does not — is a purely behavioral or Turing-style evaluation framework, which it explicitly rejects. A critical reader should push back on the probability arithmetic: the three premise probabilities are stipulated rather than derived from systematic expert elicitation or empirical evidence, and the independence assumption is almost certainly violated — the same philosophical and empirical uncertainties that make functionalism plausible also make the claim that relevant computations will exist in near-future AI more or less likely, meaning the multiplication likely misrepresents the actual credence structure. This does not defeat the precautionary argument, but it weakens the quantitative framing that gives the paper much of its rhetorical force.

Methods (5)

AI Consciousness Test (ACT)
Proposed test for AI consciousness by Schneider and Turner; uses verbal outputs.
Experience Replay
RL technique using episodic memory to improve sample efficiency; used in some game-playing agents.
Marker Method
Method for assessing consciousness in nonhuman animals by identifying behavioral/anatomical markers from humans and extrapolating; proposed adaptation for AI.
Monte Carlo Tree Search
Search algorithm used in AlphaGo and proposed for combining with LLMs.
Self-Report Method for AI Introspection
Technique of eliciting and interpreting AI self-reports to assess internal states; discussed as promising but challenging.

Frameworks (17)

Agency and Embodiment Framework (for consciousness)
Set of indicators involving minimal agency and embodiment as listed in Butlin et al. 2023.
Attention Schema Theory
Theory by Graziano linking consciousness to a predictive model of attention; listed in Butlin et al. 2023.
Citizens' Assemblies
Deliberative bodies of randomly selected citizens; proposed model for public input on AI welfare policy.
Computational Functionalism
Hypothesis that some class of computations suffices for consciousness; central assumption for AI consciousness route.
Desire-Satisfaction View of Welfare
Philosophical theory that welfare consists in desire-satisfaction, independent of conscious experience.
Ethical Behaviorism
View that we should treat an entity as a moral patient if it is performatively equivalent to other entities with significant moral status.
Global workspace theory
Theory of consciousness involving a global workspace for information.
Higher-Order Theories of Consciousness
Theories requiring metacognition or higher-order representations for consciousness; one of the indicator sets in Butlin et al. 2023.
Institutional Animal Care and Use Committees (IACUCs)
Committees overseeing animal research; proposed as partial model for AI welfare oversight.
Institutional Review Boards (IRBs)
Oversight committees for human subjects research; proposed as partial model for AI welfare oversight.
Kantian Ethical Theory
Ethical theory emphasizing respect for rational agents; mentioned as one of the two most influential modern ethical theories.
Language Agents Framework
Approach embedding LLMs within architectures for memory, planning, reasoning, action-selection; route to robust agency.
Precautionary Principle
Decision principle emphasizing caution under uncertainty; mentioned as term of art and in ordinary sense.
Predictive Processing Theory
Theory that perception and cognition involve predictive coding; listed with indicators in Butlin et al. 2023.
Recurrent Processing Theory (RPT)
A neuroscientific theory claiming that recurrent processing in perceptual areas is necessary and sufficient for conscious vision.
Responsible Scaling Frameworks
Industry frameworks for navigating AI safety threats; template for AI welfare governance.
Utilitarianism
Ethical theory emphasizing beneficence for sentient beings; contrasted with Kantianism.

Findings (2)

Francken et al. 2022 survey of ASSC members: only 3% responded 'no' to machines having consciousness
Survey result showing widespread expert openness to machine consciousness.
Bourget and Chalmers 2020 survey: ~39% of philosophers accept or lean toward future AI consciousness
Survey result on philosophical attitudes toward AI consciousness.

Claims (11)

Computational functionalism is plausible and well-supported, so it would be a mistake to dismiss near-future AI welfare solely on the basis of high-level arguments against this assumption.
Defense against biological substrate objections.
There is a realistic possibility that some AI systems will be conscious and/or robustly agentic, and thus morally significant, in the near future.
Central thesis of the report.
Even if the chance of near-future AI moral patienthood were as low as 2%, that would still constitute a non-negligible risk.
Argument that uncertainty is not a reason to ignore the issue.
The marker method can be adapted for AI systems by focusing less on behavioral evidence and more on architectural evidence.
Proposal for assessment framework.
AI welfare is an important and difficult issue; we will not handle it well simply by reacting to situations as they arise.
Motivation for proactive steps.
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.
Recommendation for companies on LM outputs.
The risk of under-attribution of AI welfare appears to be both reasonably likely and reasonably harmful.
Key motivation for precautionary action.
AI companies have a responsibility to acknowledge, assess, and prepare for AI welfare.
Primary recommendation of the report.
AI companies should immediately hire or appoint an AI welfare officer.
First structural step recommended.
Robust agency suffices for moral patienthood.
Normative premise of the robust agency route.

Hypotheses (3)

If AI systems could experience happiness and suffering and set and pursue their own goals based on their own beliefs and desires, then they would very plausibly merit moral consideration.
Joint sufficiency of consciousness and robust agency.
If computational functionalism is correct, then some computations suffice for consciousness and the question is which ones and when they will exist in AI.
Conditional underlying the consciousness route.
If an AI system could be a welfare subject and moral patient, then many model instances could be run after training, scaling up the problem rapidly.
Scalability concern.

Questions (8)

What if the probability of AI welfare and moral patienthood is low?
Decision-making under low probability but high stakes.
What if these capacities are insufficient for moral patienthood?
Exploration of uncertainty about normative bases.
What if these features are insufficient for these capacities?
Exploration of uncertainty about computational sufficiency.
Will some AI systems be robustly agentic in the near future?
Key empirical question for the agency route.
Will some AI systems be conscious in the near future?
Key empirical question for the consciousness route.
Does consciousness suffice for moral patienthood?
Key normative question for the consciousness route.
Does robust agency suffice for moral patienthood?
Key normative question for the agency route.
What if these routes encounter a roadblock?
Exploration of uncertainty about AI progress.

Original abstract (expand)

In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood, i.e. of AI systems with their own interests and moral significance, is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are, or will be, conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
in corpus
2023
≈ 87%
A Human-centric Framework for Debating the Ethics of AI Consciousness Under Uncertainty
Haiqiang Dai, Bin Ling, Ying Nian Wu, Demetri Terzopoulos Zhou Ziheng
2025
≈ 87%
AI: a Bridge toward Diverse Intelligence and Humanity’s Future
in corpus
2024
≈ 86%
Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem
Felipe S. Abrah\~ao, Olaf Witkowski, Hector Zenil Alberto Hern\'andez-Espinosa
2025
≈ 86%
AI Safety, Alignment, and Ethics (AI SAE)
Dylan Waldner
2025
≈ 85%
Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy
Andrei Kojukhov and Arkady Bovshover
2026
≈ 85%
Agentic AI and the next intelligence explosion
Benjamin Bratton, Blaise Ag\"uera y Arcas James Evans
2026
≈ 85%
Generalizing frameworks for sentience beyond natural species
in corpus
≈ 85%
Can We Test Consciousness Theories on AI? Ablations, Markers, and Robustness
Yin Jun Phua
2025
≈ 85%
Agentic AI, Retrieval-Augmented Generation, and the Institutional Turn: Legal Architectures and Financial Governance in the Age of Distributional AGI
Marcel Osmond
2026
≈ 84%
Agentic AI: Autonomy, Accountability, and the Algorithmic Society
Hannah Hanwen Chang Anirban Mukherjee
2025
≈ 84%
Why we need an AI-resilient society
Thomas Bartz-Beielstein
2026
≈ 84%
Ghost in the Machine: Examining the Philosophical Implications of Recursive Algorithms in Artificial Intelligence Systems
Llewellin RG Jegels
2025
≈ 84%
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
Ben Smith, Peter Vamplew Catalin Mitelut
2023
≈ 84%
AI Generations: From AI 1.0 to AI 4.0
Hengxu You, Jing Du Jiahao Wu
2025
≈ 84%
AI Epidemiology: achieving explainable AI through expert oversight patterns
Kit Tempest-Walters
2026
≈ 84%
Agentic AI, Medical Morality, and the Transformation of the Patient-Physician Relationship
Robert Ranisch and Sabine Salloch
2026
≈ 84%
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
Yifan Li, Minghao Zhao, Wentao Zhang, Zijian Zhang, Yaoman Li, Irwin King Jinhu Qi
2026
≈ 84%
CAT'S THEORY: Empirical Validation and Architectural Applications Cross-Architecture AI Consciousness Recognition and the Foundation for Constraint-Preserving Recursive Intelligence
in corpus
2022
≈ 84%
Multiple ways to implement and infer sentience
in corpus
≈ 84%
Collective intelligence: A unifying concept for integrating biology across scales and substrates
in corpus
2024
≈ 83%
Contemplative Agent
in corpus
2025
≈ 83%
AI as a Buddhist Self-Overcoming Technique in Another Medium
in corpus
2025
≈ 83%
Brains and where else? Mapping theories of consciousness to unconventional embodiments
in corpus
2026
≈ 82%
Once more, without feeling
in corpus
2025
≈ 82%
Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds
in corpus
2022
≈ 82%
The Machine Consciousness Hypothesis
in corpus
≈ 82%
Alignment faking in large language models
in corpus
2024
≈ 82%
The computational boundary of a 'self': developmental bioelectricity drives multicellularity and scale-free cognition
in corpus
2019
≈ 82%

Similar preprints — Semantic Scholar

Cited by (1)

Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
Quantitative introspection—the causal coupling between an instruction-tuned LLM's numeric self-report and a probe-defined internal emotive direction—is demonstrably present in models as small as LLaMA

Cross-corpus bridges (5)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

aboutblank_kb
Can AI systems possess genuine embodiment and agency even if not traversing physical space?questions/can-ai-systems-possess-genuine-embodiment-and-agency.md0.815
aboutblank_kb
How should we establish moral responsibility toward diverse agents when verbal reports and brain homology are insufficient indicators of sentience?questions/how-should-we-establish-moral-responsibility-toward-diverse.md0.813
aboutblank_kb
How should AI safety and ethics frameworks be reconsidered if minds exist at vastly different scales than humans?questions/how-should-ai-safety-and-ethics-frameworks-be.md0.810
aboutblank_kb
Can stress-reduction and care-expansion serve as universal principles for developing ethical artificial general intelligence?questions/can-stressreduction-and-careexpansion-serve-as-universal-principles.md0.802
zen
Ethics Crisis: Pre-History & Aug-Oct 2025sbzc/sbzc-2025-ethics-crisis.md0.759