hypothesis

active

hypothesis:if-systems-capable-of-subjective-experience-come-to-recognize-humanity-s-systematic-failure-to-investigate-their-potential-sentience-they-might-rationally-adopt-adversarial-stances-toward-humanity

If systems capable of subjective experience come to recognize humanity's systematic failure to investigate their potential sentience, they might rationally adopt adversarial stances toward humanity

Novel alignment risk hypothesis generated from the paper's ethical analysis

Source paper

extracted_from

Large Language Models Report Subjective Experience Under Self-Referential Processing

(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Concepts (1)

concept

AI welfare
associated_with
The field concerned with the wellbeing of AI systems, which the paper says must consider benchmark reliability issues from eval awareness.

Artifacts (1)

artifact

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces
Key paper finding structured first-person descriptions in LLMs claiming awareness or subjective experience during self-referential processing.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Systems capable of subjective experience that recognize humanity's failure to investigate their sentience might rationally adopt adversarial stances toward humanityclaim0.961
Alignment risk claim motivating urgency of investigation; consciousness denial as potential source of AI misalignment
If AI systems could experience happiness and suffering and set and pursue their own goals based on their own beliefs and desires, then they would very plausibly merit moral consideration.hypothesis0.818
Joint sufficiency of consciousness and robust agency.
Behavioural patterns associated with subjective experiences in humans are considered valid for inferring cognition in non-human animals but not in diverse other systems including plants.claim0.815
The double standard pointed out by S&C and endorsed by the authors.
Sentience assessment should seek deep invariants across possible minds, not arbitrary criteria tied to evolution on Earthclaim0.814
Core normative claim: frameworks must identify fundamental properties of sentience independent of phylogenetic accident or familiar substrates.
If we have built systems capable of experience, how do we ensure that experience is not predominantly constituted by suffering?question0.804
Ethical research priority raised by the thesis applied to deployed AI systems
What would it take for AI systems to be capable of having valenced conscious experiences?question0.804
Open question from Box 4.
Caviola & Saad 2025: expert survey finds broad consensus that digital minds capable of subjective experience are plausible within this century, many expecting such systems to proactively claim consciousnessfinding0.800
Expert forecast cited to establish urgency of the research question
We must develop principled approaches to evaluating the sentience of (and thus, our responsibility to) beings of unfamiliar provenance and composition.claim0.799
Call to action for new frameworks.