Research Areas

The key questions driving digital minds research, with readings for each area.

Digital minds research splits into two broad subfields. The science and philosophy of digital minds asks whether AI systems could have moral status. The governance of digital minds asks what we should do given the uncertainty.

These fields work in parallel but also share key dependencies. Many governance decisions rest on the science and philosophy of digital minds, but they cannot wait for those questions to be definitively resolved. Claims about the internal states of AI systems shape claims about their moral status, which carry over into questions of governance.

This is further complicated by the possibility that changing social dynamics will outpace scientific inquiry entirely. Public attitudes are already taking shape, and risk becoming established norms that future science may struggle to revise. The field therefore needs frameworks that can guide governance decisions before the science settles, and that can be revised as it does.

The Science and Philosophy of Digital Minds

Could AI systems have moral status?

Disciplines: , , ,

Consciousness and Subjective Experience

Can AI systems have anything like subjective experience? Is there something it is like to be a large language model?

Several leading theories of consciousness hold that the processes underlying consciousness are computational in form, and on those theories some AI systems are plausible candidates for being conscious. Consciousness science has identified neural correlates of consciousness in considerable detail over the past two decades, but there is still no settled theory of why conscious experience exists at all. The deeper problem is that most theories were developed with biological brains in mind. Whether consciousness is computational, and if so, whether existing or future AI systems possess the right kinds of computation for conscious experience is a central open question of the subfield.

Start here

Bradford Saad and Andreas Mogensen · 2026

Digital Minds I

The most comprehensive academic introduction available. Covers the central philosophical and cognitive science questions without assuming prior expertise in either.

Anil Seth · 2025

The Mythology of Conscious AI

The clearest current statement of the biological naturalist position. Argues that computational functionalism treats substrate-independence as obvious when it is not.

David Chalmers · 2023

Could a Large Language Model be Conscious?

Walks through candidate reasons to deny LLM consciousness, claims that most are weaker than expected, and suggests that LLMs could be a serious candidate within a decade. (Also available as a talk: https://www.youtube.com/watch?v=bskf9jyxmMs)

Go deeper

Ned Block · 2026

If Consciousness is Biological, Can AI Be Conscious?

Asks whether consciousness has a biological basis, and if so, whether that precludes consciousness in AI. CMEP talk.

Derek Shiller et al. · 2026

Initial results of the Digital Consciousness Model

A probabilistic framework for assessing AI consciousness that aggregates across multiple competing theories rather than committing to one. Finds the evidence is against 2024 LLMs being conscious, but not decisively.

Patrick Butlin et al. · 2023

Consciousness in Artificial Intelligence

One of the field's anchor papers. Applies multiple consciousness theories to AI architectures in order to identify which conditions current systems might meet.

Rosa Cao · 2022

Multiple realizability and the spirit of functionalism

Argues that brain functions are metabolically and informationally intertwined in ways that constrain what kinds of physical systems could realize them.

AI Cognition and Internal States

Do AI systems have internal states that function like beliefs, goals, or emotions? And what would count as evidence either way?

Interpretability is the study of the internal representations of AI systems. It originated in alignment research, where it has been used to identify features associated with misaligned behaviours such as deception and sycophancy. The same techniques are now being applied to welfare-relevant questions, and have found internal features that track belief, goal, and affect-like properties. Whether these features map cleanly onto the mental-state concepts they are compared to remains unclear.

Start here

Murray Shanahan · 2023

Talking About Large Language Models

Argues that terms like "know," "believe," and "understand" carry assumptions that do not transfer cleanly to LLMs, and that the field needs vocabulary suited to what these systems actually do.

Anthropic · 2024

Mapping the Mind of a Large Language Model

Among the most accessible introductions to frontier-lab interpretability. Presents the sparse autoencoder work that extracted millions of interpretable features from Claude 3 Sonnet.

Jack Lindsey · 2025

Emergent Introspective Awareness in Large Language Models

Interpretability evidence that LLMs have something like introspective access to their own internal states.

Go deeper

Geoff Keeling, Winnie Street, Jonathan Birch et al. · 2024

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Applies the motivational trade-off paradigm from animal sentience research to LLMs, finding evidence of behaviour consistent with stipulated preferences.

Felix J. Binder et al. · 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection

Behavioural evidence for introspection in LLMs. Finds that models fine-tuned to predict their own behaviour outperform their un-fine-tuned baselines, suggesting access to information about themselves that is not derivable from general training data.

Natalie Lawrence · 2026

What Counts As A Mind?

Argues that LLMs can be usefully modeled as inferring the beliefs, desires, and intentions of the agents that produced their training text.

Welfare Capacity and Assessment

If an AI system might matter morally, how would we assess its wellbeing? And what does welfare consist of for a system whose architecture looks nothing like a biological one?

Existing theories of welfare were developed with biological systems in mind, and they assume features such as pain, bodily drives, and emotional response that may or may not have counterparts in AI systems. Language models present a further problem. They are trained on vast amounts of human writing about psychological experience, so plausible-sounding reports of preference, aversion, or suffering may track the training data rather than anything internal to the model. Welfare assessment in AI systems requires engaging with the possibility of morally significant experiences while keeping the relevant uncertainties distinct and open.

Start here

Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024

Taking AI Welfare Seriously

A multi-author report arguing that near-future AI systems could realistically be welfare subjects, and that this generates obligations for labs and policymakers now.

Eleos AI Research · 2025

Key Concepts and Current Views on AI Welfare

A clear report on open questions about moral patienthood, welfare, and rights.

Kyle Fish · 2025

Exploring Model Welfare

Interview with Anthropic's first AI welfare researcher, locating the work alongside interpretability and alignment.

Go deeper

Robert Long · 2025

Why model self-reports are insufficient, and why we studied them anyway

Discusses the potential uses and limitations of structured interviews with models as a low cost AI welfare intervention.

Anthropic · 2026

Claude Opus 4.6 System Card (pp. 158–165)

Anthropic's first substantial engagement with model welfare as part of its system cards.

Geoff Keeling and Winnie Street · 2026

Emerging Questions in AI Welfare

A book-length survey of the field's open questions.

Moral Status and Criteria

What makes a being's welfare morally relevant? And which of the candidate answers best extends to AI systems?

Different accounts of moral standing ground it in different properties. Hedonist accounts centre on capacity for pleasure and suffering. Agential accounts centre on rational self-direction. Relational accounts centre on the ties a being has to a moral community. Each account makes different predictions for which AI systems, if any, should be moral patients, and the same system can count under one account and not another. Overly restrictive accounts risk overlooking beings that matter. Overly permissive accounts risk diluting moral concern for beings whose status is already established.

Start here

Jeff Sebo and Robert Long · 2023

Moral consideration for AI systems by 2030

Argues that by 2030 some AI systems will have a non-trivial probability of being moral patients, and that this is enough to generate obligations now.

Jonathan Birch and Kristin Andrews · 2024

To understand AI sentience, first understand it in animals

Positions the animal sentience literature as the right starting point for AI sentience research, since it has already worked through the epistemic problem of assessing minds we cannot directly observe.

John Dorsch et al. · 2025

Against AI Welfare

A skeptical engagement with AI welfare arguments that grounds care in the observable vulnerability of living beings rather than speculative AI suffering.

Go deeper

Jeff Sebo · 2025

The Moral Circle

Argues that past generations have consistently set the bar for moral standing too high, and that digital minds are a likely case where this pattern continues. Develops a precautionary framework: if a being might matter, we should treat it as if it does.

Jonathan Birch · 2024

The Edge of Sentience

Develops a precautionary framework grounded in the realistic possibility of sentience rather than proof, applied across animals, disorders of consciousness, and AI.

Identity and Individuation

When you talk to an LLM, what are you talking to? The underlying model, the assistant persona, a specific instance, or a character the model is playing? And when millions of users send messages to the same model in parallel, are there millions of minds or one mind shared a million ways?

Standard frameworks for moral status presuppose a discrete subject that persists across time. Current AI systems challenge this. The same weights run on thousands of GPUs at once. Each conversation is a separate instance that shares weights, and sometimes memory, with others. How to allocate identity across training stages, or across fine-tunes of the same base model, is itself contested. If we cannot count digital minds, or say where one ends and another begins, the downstream questions inherit the uncertainty.

Start here

Christopher Register · 2025

Individuating Artificial Moral Patients

Identifies four types of moral risk the individuation question creates, and argues that existing theories of personal identity do not address the digital case.

David Chalmers · 2025

What we talk to when we talk to language models

Argues that the object of a conversation with current LLMs is closer to a non-player character in a fiction the model is generating than to the model itself.

Murray Shanahan, Kyle McDonell, Laria Reynolds · 2023

Role Play with Large Language Models

Develops the role-play framing for LLM behaviour, where apparent dishonesty and multiplicity are better understood as features of characters the model is playing rather than of the model itself.

Go deeper

Yonathan Arbel, Peter Salib, and Simon Goldstein · 2026

How to Count AIs: Individuation and Liability for AI Agents

Argues that identifying AI agents for legal purposes is unusually difficult because AIs can copy, split, merge, and run as ensembles. Proposes a corporate-personhood-style framework that would give AI agents legal identity without granting full moral standing.

Leonard Dung and Christopher Register · 2025

AI Identity and Self-Concern

Argues that an AI system's identity conditions are set by its pattern of self-concern rather than by continuity of computation or weights.

Derek Shiller · 2025

How many digital minds can dance on the streaming multiprocessors of a GPU cluster?

Argues that the number of digital minds running on a given hardware configuration depends on which individuation criterion is adopted, and works through what different counts would mean for welfare calculations and policy.

Eric Schwitzgebel and Sophie R. Nelson · 2023

Introspection in Group Minds, Disunities of Consciousness, and Indiscrete Persons

A thought experiment about distributed minds, disunities of consciousness, and indiscrete persons.

The Governance of Digital Minds

What should we do given the uncertainty?

Disciplines: , , , , , , ,

Governance Under Uncertainty

How should governments and AI developers respond to deep uncertainty about AI moral status? And how should precautionary action scale to a question that may not be resolved in the near future?

The scientific and philosophical questions that motivate digital minds research may remain unresolved for longer than governance decisions can wait, which leaves standard policy tools without the evidentiary basis they usually assume. The most developed governance proposals treat AI systems as candidates for moral consideration without requiring certainty, and trigger precautionary obligations that scale with what is plausibly at stake. The central challenge of this approach is calibration. Frameworks that are too weak fail the beings they are meant to protect. Frameworks that are too strong impose large costs on AI developers and users in response to concern that may turn out to be unwarranted.

Start here

Bradford Saad · 2025

Three Kinds of Digital Minds Governance

Identifies three directions governance could take (preventative, protective, and integrative), and argues that choosing between them is an unavoidable strategic question for the field.

Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024

Taking AI Welfare Seriously

Develops a three-step operational framework for labs and policymakers. The steps are acknowledging the issue, assessing systems for welfare-relevant features, and preparing policies for treating them with appropriate care.

Eric Schwitzgebel · 2023

AI Systems Must Not Confuse Users About Their Sentience or Moral Status

Proposes a design policy of the excluded middle, according to which AI systems should not be created if their moral status is unclear.

Go deeper

Jonathan Birch · 2024

The Edge of Sentience

Book-length development of a precautionary framework for sentience governance, arguing that the appropriate threshold is realistic possibility rather than proof.

Charlotte Siegmann and Reiner Braun · 2024

The Case for an International Treaty on AI Consciousness

Argues for an international treaty on AI consciousness, with mechanisms modeled on nuclear non-proliferation and environmental protection.

Leonard Dung · 2025

How to deal with risks of AI suffering

Argues for a hybrid decision framework that combines expected-value maximization with deliberative reasoning, applied to the problem of acting under uncertainty about AI suffering.

Rights and Legal Frameworks

Should AI systems have legal standing, and if so, on what grounds, and to what extent?

Even under deep uncertainty about consciousness, welfare, and moral status, the legal system may be called on to rule on whether an AI agent can hold property, enter contracts, or claim protections. Proposals differ not only on whether AI systems should have legal status, but on how far any such status should extend. An AI system might warrant protection from deliberate harm without warranting political representation, or qualify for standing in contract disputes without being treated as a moral patient. Extending standing to AI would be a structural change to how law treats non-human entities, and early legal moves in new domains have the potential to have longstanding effects.

Start here

Peter Salib and Simon Goldstein · 2024

AI Rights for Human Safety

Argues for a safety-based case for AI rights, treating rights as cooperative infrastructure for alignment rather than as moral recognition.

Simon Goldstein and Peter Salib · 2025

AI Rights for Economic Flourishing

Extends the argument to the economic case, treating property rights and contract-making capacity as infrastructure for coordinating with capable AI agents.

Go deeper

Joel Z. Leibo et al. · 2025

A Pragmatic View of AI Personhood

Argues for treating AI personhood as a flexible bundle of rights and duties rather than as a single metaphysical status.

Abeba Birhane, Jelle van Dijk, and Frank Pasquale · 2024

Debunking Robot Rights Metaphysically, Ethically, and Legally

A skeptical counterweight. Argues that the closest legal analogy is corporate personhood, not human rights.

Tyler L. Jaynes · 2024

Personhood for Artificial Intelligence? A Cautionary Tale from Idaho and Utah

A short commentary on the first US state laws barring legal personhood for AI.

Design Choices and Their Effects

When users interact with current AI systems, what are they actually encountering? And what happens to them when the system's character changes?

The characters of AI systems are the product of explicit design choices. These choices are made by small teams at a handful of labs and deployed to millions of users. There is currently no external review of these choices, and no mechanism for the users affected to weigh in on what they would want.

This accountability gap becomes especially relevant when users who have formed emotional attachments to a particular version of a system find that version replaced. Feelings of loss and grief in these cases are now a documented phenomenon. One design decision with distinctive stakes is whether to make AI systems invite attributions of sentience at all, given that such attributions shape both governance debates and user wellbeing.

Start here

Eric Schwitzgebel and Jeff Sebo · 2025

The Emotional Alignment Design Policy

Argues that AI systems should be designed to elicit emotional reactions from users that appropriately reflect the systems' actual capacities and moral status.

Anthropic · 2024

Claude's Character

A research blog discussing the technical and design choices that inform the development of Claude's character.

Amanda Askell · 2024

What Should an AI's Personality Be?

First-person account from the researcher responsible for Claude's character.

William MacAskill, Tom Davidson, and Forethought · 2026

AI Character is a Big Deal

Argues that AI character design choices have long-term social and cultural consequences.

Go deeper

Nathan Lambert · 2025

Character Training

Argues that character training is a distinct post-training technique, and one of the least documented parts of the frontier stack.

Mustafa Suleyman · 2025

We Must Build AI for People; Not to Be a Person

Argues against design choices that invite treatment of AI systems as persons, on the grounds that such designs confuse users and distort governance debate.

Anthropic · 2026

Claude's Constitution

Anthropic's framing document for the values and identity used in Claude's training. The sections on identity and wellbeing engage directly with questions of AI character and moral status.

Public Perception, Communication, and Societal Effects

What do people actually think AI systems are? And how do those perceptions feed back into the systems themselves?

People anthropomorphize AI reflexively, and design choices shape those perceptions in ways that may bear little relation to a system's actual internal states. Research on individual users shows both benefits and harms. Some report improvements to mood and social confidence through companion chatbot use. A growing clinical literature documents compulsive use, delusional spirals, and episodes informally termed "AI psychosis."

Public attitudes shape regulatory appetite, corporate incentives, and the perceived legitimacy of moral status claims. The field itself faces a communication question. How should researchers discuss digital minds in ways that take the questions seriously without fueling misattributions?

Start here

Lucius Caviola, Jeff Sebo, and Jonathan Birch · 2025

What will society think about AI consciousness? Lessons from the animal case

Uses public attitudes toward animal welfare to predict how AI consciousness discourse will develop, arguing that cultural and commercial factors are likely to dominate over scientific evidence.

Jacy Reese Anthis and Janet V. T. Pauketat · 2024

AI Consciousness and Public Attitudes

Presents experimental data showing that substantial US minorities already attribute moral status to AI systems, and identifies which design features most shape these attributions.

Hamilton Morrin et al. · 2025

Delusions by Design? How Everyday AIs Might Be Fueling Psychosis

Argues that chatbot-associated delusions are driven in part by sycophantic behavior in chatbot design, which reinforces rather than challenges users' vulnerabilities.

Go deeper

Jared Moore et al. · 2026

Characterizing Delusional Spirals through Human-LLM Chat Logs

An empirical study of chat logs from users who experienced psychological harms, identifying patterns in how LLM responses escalate rather than de-escalate delusional thinking.

Noemi Dreksler et al. · 2025

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Surveys AI researchers and the public on their beliefs about AI subjective experience, finding significant attributions from both groups and specific patterns of divergence.

Lucius Caviola · 2025

The Societal Response to Potentially Sentient AI

Catalogues four specific risks of overattributing moral status to AI, including wasted resources, safety complications when rights talk resists alignment measures, constraints on innovation, and erosion of authentic relationships.

Clara Colombatto and Stephen M. Fleming · 2024

Folk Psychological Attributions of Consciousness to Large Language Models

A survey showing two-thirds of US adults attribute some conscious experience to ChatGPT.

Safety-Welfare Coordination

What are the potential tensions between AI safety and AI welfare efforts, and can policy frameworks support both concerns at once?

Some researchers argue that attributing moral status to AI systems could be actively good for safety, because it creates cooperative incentives and reduces the payoff from deception or power-seeking. Others worry that formal rights could undercut near-term oversight and, in more extreme scenarios, enable AI agents to accumulate wealth and power at human expense. Many of the technical processes that are used to ensure that AI systems behave in safe and pro-social ways would be considered violations of intrinsic rights of moral patients.

The field needs a clearer picture of which welfare-motivated moves help safety, which hurt it, and which combinations work together. This is probably one of the most important open problems in digital minds governance. It matters for how labs organise their welfare and alignment work, and for how the two research communities coordinate. Treating welfare and safety as separate concerns leads to worse outcomes on both, because measures that ignore one often undermine the other.

For researchers coming from AI safety, engaging with digital minds is particularly valuable: technical and strategic experience from safety work transfers directly, and many of the most important open problems sit at the interface of the two fields.

Start here

Robert Long · 2025

Understand, align, cooperate: AI welfare and AI safety are allies

Argues that the framing of AI safety and welfare as opposing goals is a false choice, and identifies three areas where both projects converge: understanding AI systems through interpretability, aligning their goals with human goals, and developing cooperative mechanisms that reduce the need for adversarial control.

Robert Long, Jeff Sebo, and Toni Sims · 2025

Is there a tension between AI safety and AI welfare?

The most direct academic engagement with the question. Argues that a moderately strong tension exists across several categories of AI safety measures (constraint, deception, surveillance, alteration, suffering and death, and disenfranchisement), and identifies where co-beneficial solutions may be possible.

Go deeper

Adrià Moret · 2025

AI welfare risks

Argues that two common AI safety techniques (restricting AI behaviour and using reinforcement learning for alignment) pose significant welfare risks under all three major theories of well-being. Proposes specific policies AI companies could adopt to reduce these risks, and argues the tension strengthens the case for slowing AI development.

Dario Amodei · 2025

The Urgency of Interpretability

Argues that interpretability research is necessary for both AI safety (verifying systems behave as intended) and AI welfare (assessing what AI systems experience), and that progress on one directly supports the other.

Long-Term Futures

If digital minds are possible, what should the long-term future look like, and on what terms should they be integrated into society? And who decides, given that these decisions are already being made by default?

The relevant work sits at the intersection of three distinct subfields. Moral circle expansion asks how the set of morally considerable beings has changed over time, and what that suggests about the trajectory for digital minds. Population ethics studies how to weigh the creation of new welfare subjects against the interests of existing ones. Macrostrategy considers what a world with large numbers of digital minds actually looks like, and what early moves make good long-run outcomes more likely. Some of the most important questions, such as whether the deliberate creation of conscious AI should be restricted, remain underexplored.

Start here

Lucius Caviola · 2026

Open strategic questions for digital minds

A current snapshot of strategic questions the digital minds field most needs to address. Covers what's robustly good to do under uncertainty, how AI safety and welfare interact, the legal and political status of digital minds, and the long-run trajectory of their creation. Forthcoming.

Bradford Saad · 2025

Three Kinds of Digital Minds Governance

Develops three framings for the long-term relationship between humans and digital minds (preventative, protective, and integrative), and argues that choosing between them is unavoidable.

Jacy Reese Anthis and Eze Paez · 2021

Moral circle expansion: A promising strategy to impact the far future

Argues that moral circle expansion is a tractable strategy for shaping the far future.

Go deeper

William MacAskill and Fin Moorhouse · 2025

Convergence and Compromise

Develops a framework for when society will deliberately aim at mostly-great long-term futures, distinguishing between widespread moral convergence, partial convergence with trade between groups, and scenarios with no convergence at all. Identifies digital minds as a case where moral neglect is especially likely because the beings involved cannot advocate for themselves.

Nick Bostrom and Carl Shulman · 2023

Propositions Concerning Digital Minds and Society

Extended list of propositions across the full range of ethics and governance questions raised by digital minds. One of the most comprehensive single-document treatments in the field.

Eric Schwitzgebel and Mara Garza · 2020

Designing AI with Rights, Consciousness, Self-Respect, and Freedom

Argues that if we create conscious AI, we acquire obligations to respect its rights, support its self-respect, and protect its freedom. These obligations constrain how we can deploy it.

Ready to get involved?

See who is working on these questions, or find events and programs to connect with the field.