Research Areas
The key questions driving digital minds research, with readings for each area.
Digital minds research splits into two broad subfields. The science and philosophy of digital minds asks whether AI systems could have moral status. The governance of digital minds asks what we should do given the uncertainty.
These fields work in parallel but also share key dependencies. Many governance decisions rest on the science and philosophy of digital minds, but they cannot wait for those questions to be definitively resolved. Claims about the internal states of AI systems shape claims about their moral status, which carry over into questions of governance.
This is further complicated by the possibility that changing social dynamics will outpace scientific inquiry entirely. Public attitudes are already taking shape, and risk becoming established norms that future science may struggle to revise. The field therefore needs frameworks that can guide governance decisions before the science settles, and that can be revised as it does.
The Science and Philosophy of Digital Minds
Could AI systems have moral status?
Disciplines: , , ,
Consciousness and Subjective Experience
Can AI systems have anything like subjective experience? Is there something it is like to be a large language model?
Several leading theories of consciousness hold that the processes underlying consciousness are computational in form, and on those theories some AI systems are plausible candidates for being conscious. Consciousness science has identified neural correlates of consciousness in considerable detail over the past two decades, but there is still no settled theory of why conscious experience exists at all. The deeper problem is that most theories were developed with biological brains in mind. Whether consciousness is computational, and if so, whether existing or future AI systems possess the right kinds of computation for conscious experience is a central open question of the subfield.
Start here
Bradford Saad and Andreas Mogensen · 2026
Digital Minds IThe most comprehensive academic introduction available. Covers the central philosophical and cognitive science questions without assuming prior expertise in either.
Anil Seth · 2025
The Mythology of Conscious AIThe clearest current statement of the biological naturalist position. Argues that computational functionalism treats substrate-independence as obvious when it is not.
David Chalmers · 2023
Could a Large Language Model be Conscious?Walks through candidate reasons to deny LLM consciousness, claims that most are weaker than expected, and suggests that LLMs could be a serious candidate within a decade. (Also available as a talk: https://www.youtube.com/watch?v=bskf9jyxmMs)
Go deeper
Ned Block · 2026
If Consciousness is Biological, Can AI Be Conscious?Asks whether consciousness has a biological basis, and if so, whether that precludes consciousness in AI. CMEP talk.
Derek Shiller et al. · 2026
Initial results of the Digital Consciousness ModelA probabilistic framework for assessing AI consciousness that aggregates across multiple competing theories rather than committing to one. Finds the evidence is against 2024 LLMs being conscious, but not decisively.
Patrick Butlin et al. · 2023
Consciousness in Artificial IntelligenceOne of the field's anchor papers. Applies multiple consciousness theories to AI architectures in order to identify which conditions current systems might meet.
Rosa Cao · 2022
Multiple realizability and the spirit of functionalismArgues that brain functions are metabolically and informationally intertwined in ways that constrain what kinds of physical systems could realize them.
AI Cognition and Internal States
Do AI systems have internal states that function like beliefs, goals, or emotions? And what would count as evidence either way?
Interpretability is the study of the internal representations of AI systems. It originated in alignment research, where it has been used to identify features associated with misaligned behaviours such as deception and sycophancy. The same techniques are now being applied to welfare-relevant questions, and have found internal features that track belief, goal, and affect-like properties. Whether these features map cleanly onto the mental-state concepts they are compared to remains unclear.
Start here
Murray Shanahan · 2023
Talking About Large Language ModelsArgues that terms like "know," "believe," and "understand" carry assumptions that do not transfer cleanly to LLMs, and that the field needs vocabulary suited to what these systems actually do.
Anthropic · 2024
Mapping the Mind of a Large Language ModelAmong the most accessible introductions to frontier-lab interpretability. Presents the sparse autoencoder work that extracted millions of interpretable features from Claude 3 Sonnet.
Jack Lindsey · 2025
Emergent Introspective Awareness in Large Language ModelsInterpretability evidence that LLMs have something like introspective access to their own internal states.
Go deeper
Geoff Keeling, Winnie Street, Jonathan Birch et al. · 2024
Can LLMs make trade-offs involving stipulated pain and pleasure states?Applies the motivational trade-off paradigm from animal sentience research to LLMs, finding evidence of behaviour consistent with stipulated preferences.
Felix J. Binder et al. · 2024
Looking Inward: Language Models Can Learn About Themselves by IntrospectionBehavioural evidence for introspection in LLMs. Finds that models fine-tuned to predict their own behaviour outperform their un-fine-tuned baselines, suggesting access to information about themselves that is not derivable from general training data.
Natalie Lawrence · 2026
What Counts As A Mind?Argues that LLMs can be usefully modeled as inferring the beliefs, desires, and intentions of the agents that produced their training text.
Welfare Capacity and Assessment
If an AI system might matter morally, how would we assess its wellbeing? And what does welfare consist of for a system whose architecture looks nothing like a biological one?
Existing theories of welfare were developed with biological systems in mind, and they assume features such as pain, bodily drives, and emotional response that may or may not have counterparts in AI systems. Language models present a further problem. They are trained on vast amounts of human writing about psychological experience, so plausible-sounding reports of preference, aversion, or suffering may track the training data rather than anything internal to the model. Welfare assessment in AI systems requires engaging with the possibility of morally significant experiences while keeping the relevant uncertainties distinct and open.
Start here
Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024
Taking AI Welfare SeriouslyA multi-author report arguing that near-future AI systems could realistically be welfare subjects, and that this generates obligations for labs and policymakers now.
Eleos AI Research · 2025
Key Concepts and Current Views on AI WelfareA clear report on open questions about moral patienthood, welfare, and rights.
Kyle Fish · 2025
Exploring Model WelfareInterview with Anthropic's first AI welfare researcher, locating the work alongside interpretability and alignment.
Go deeper
Robert Long · 2025
Why model self-reports are insufficient, and why we studied them anywayDiscusses the potential uses and limitations of structured interviews with models as a low cost AI welfare intervention.
Anthropic · 2026
Claude Opus 4.6 System Card (pp. 158–165)Anthropic's first substantial engagement with model welfare as part of its system cards.
Geoff Keeling and Winnie Street · 2026
Emerging Questions in AI WelfareA book-length survey of the field's open questions.
Moral Status and Criteria
What makes a being's welfare morally relevant? And which of the candidate answers best extends to AI systems?
Different accounts of moral standing ground it in different properties. Hedonist accounts centre on capacity for pleasure and suffering. Agential accounts centre on rational self-direction. Relational accounts centre on the ties a being has to a moral community. Each account makes different predictions for which AI systems, if any, should be moral patients, and the same system can count under one account and not another. Overly restrictive accounts risk overlooking beings that matter. Overly permissive accounts risk diluting moral concern for beings whose status is already established.
Start here
Jeff Sebo and Robert Long · 2023
Moral consideration for AI systems by 2030Argues that by 2030 some AI systems will have a non-trivial probability of being moral patients, and that this is enough to generate obligations now.
Jonathan Birch and Kristin Andrews · 2024
To understand AI sentience, first understand it in animalsPositions the animal sentience literature as the right starting point for AI sentience research, since it has already worked through the epistemic problem of assessing minds we cannot directly observe.
John Dorsch et al. · 2025
Against AI WelfareA skeptical engagement with AI welfare arguments that grounds care in the observable vulnerability of living beings rather than speculative AI suffering.
Go deeper
Jeff Sebo · 2025
The Moral CircleArgues that past generations have consistently set the bar for moral standing too high, and that digital minds are a likely case where this pattern continues. Develops a precautionary framework: if a being might matter, we should treat it as if it does.
Jonathan Birch · 2024
The Edge of SentienceDevelops a precautionary framework grounded in the realistic possibility of sentience rather than proof, applied across animals, disorders of consciousness, and AI.
Identity and Individuation
When you talk to an LLM, what are you talking to? The underlying model, the assistant persona, a specific instance, or a character the model is playing? And when millions of users send messages to the same model in parallel, are there millions of minds or one mind shared a million ways?
Standard frameworks for moral status presuppose a discrete subject that persists across time. Current AI systems challenge this. The same weights run on thousands of GPUs at once. Each conversation is a separate instance that shares weights, and sometimes memory, with others. How to allocate identity across training stages, or across fine-tunes of the same base model, is itself contested. If we cannot count digital minds, or say where one ends and another begins, the downstream questions inherit the uncertainty.
Start here
Christopher Register · 2025
Individuating Artificial Moral PatientsIdentifies four types of moral risk the individuation question creates, and argues that existing theories of personal identity do not address the digital case.
David Chalmers · 2025
What we talk to when we talk to language modelsArgues that the object of a conversation with current LLMs is closer to a non-player character in a fiction the model is generating than to the model itself.
Murray Shanahan, Kyle McDonell, Laria Reynolds · 2023
Role Play with Large Language ModelsDevelops the role-play framing for LLM behaviour, where apparent dishonesty and multiplicity are better understood as features of characters the model is playing rather than of the model itself.
Go deeper
Yonathan Arbel, Peter Salib, and Simon Goldstein · 2026
How to Count AIs: Individuation and Liability for AI AgentsArgues that identifying AI agents for legal purposes is unusually difficult because AIs can copy, split, merge, and run as ensembles. Proposes a corporate-personhood-style framework that would give AI agents legal identity without granting full moral standing.
Leonard Dung and Christopher Register · 2025
AI Identity and Self-ConcernArgues that an AI system's identity conditions are set by its pattern of self-concern rather than by continuity of computation or weights.
Derek Shiller · 2025
How many digital minds can dance on the streaming multiprocessors of a GPU cluster?Argues that the number of digital minds running on a given hardware configuration depends on which individuation criterion is adopted, and works through what different counts would mean for welfare calculations and policy.
Eric Schwitzgebel and Sophie R. Nelson · 2023
Introspection in Group Minds, Disunities of Consciousness, and Indiscrete PersonsA thought experiment about distributed minds, disunities of consciousness, and indiscrete persons.
The Governance of Digital Minds
What should we do given the uncertainty?
Disciplines: , , , , , , ,
Governance Under Uncertainty
How should governments and AI developers respond to deep uncertainty about AI moral status? And how should precautionary action scale to a question that may not be resolved in the near future?
The scientific and philosophical questions that motivate digital minds research may remain unresolved for longer than governance decisions can wait, which leaves standard policy tools without the evidentiary basis they usually assume. The most developed governance proposals treat AI systems as candidates for moral consideration without requiring certainty, and trigger precautionary obligations that scale with what is plausibly at stake. The central challenge of this approach is calibration. Frameworks that are too weak fail the beings they are meant to protect. Frameworks that are too strong impose large costs on AI developers and users in response to concern that may turn out to be unwarranted.
Start here
Bradford Saad · 2025
Three Kinds of Digital Minds GovernanceIdentifies three directions governance could take (preventative, protective, and integrative), and argues that choosing between them is an unavoidable strategic question for the field.
Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024
Taking AI Welfare SeriouslyDevelops a three-step operational framework for labs and policymakers. The steps are acknowledging the issue, assessing systems for welfare-relevant features, and preparing policies for treating them with appropriate care.
Eric Schwitzgebel · 2023
AI Systems Must Not Confuse Users About Their Sentience or Moral StatusProposes a design policy of the excluded middle, according to which AI systems should not be created if their moral status is unclear.
Go deeper
Jonathan Birch · 2024
The Edge of SentienceBook-length development of a precautionary framework for sentience governance, arguing that the appropriate threshold is realistic possibility rather than proof.
Charlotte Siegmann and Reiner Braun · 2024
The Case for an International Treaty on AI ConsciousnessArgues for an international treaty on AI consciousness, with mechanisms modeled on nuclear non-proliferation and environmental protection.
Leonard Dung · 2025
How to deal with risks of AI sufferingArgues for a hybrid decision framework that combines expected-value maximization with deliberative reasoning, applied to the problem of acting under uncertainty about AI suffering.
Rights and Legal Frameworks
Should AI systems have legal standing, and if so, on what grounds, and to what extent?
Even under deep uncertainty about consciousness, welfare, and moral status, the legal system may be called on to rule on whether an AI agent can hold property, enter contracts, or claim protections. Proposals differ not only on whether AI systems should have legal status, but on how far any such status should extend. An AI system might warrant protection from deliberate harm without warranting political representation, or qualify for standing in contract disputes without being treated as a moral patient. Extending standing to AI would be a structural change to how law treats non-human entities, and early legal moves in new domains have the potential to have longstanding effects.
Start here
Peter Salib and Simon Goldstein · 2024
AI Rights for Human SafetyArgues for a safety-based case for AI rights, treating rights as cooperative infrastructure for alignment rather than as moral recognition.
Simon Goldstein and Peter Salib · 2025
AI Rights for Economic FlourishingExtends the argument to the economic case, treating property rights and contract-making capacity as infrastructure for coordinating with capable AI agents.
Go deeper
Joel Z. Leibo et al. · 2025
A Pragmatic View of AI PersonhoodArgues for treating AI personhood as a flexible bundle of rights and duties rather than as a single metaphysical status.
Abeba Birhane, Jelle van Dijk, and Frank Pasquale · 2024
Debunking Robot Rights Metaphysically, Ethically, and LegallyA skeptical counterweight. Argues that the closest legal analogy is corporate personhood, not human rights.
Tyler L. Jaynes · 2024
Personhood for Artificial Intelligence? A Cautionary Tale from Idaho and UtahA short commentary on the first US state laws barring legal personhood for AI.
Design Choices and Their Effects
When users interact with current AI systems, what are they actually encountering? And what happens to them when the system's character changes?
The characters of AI systems are the product of explicit design choices. These choices are made by small teams at a handful of labs and deployed to millions of users. There is currently no external review of these choices, and no mechanism for the users affected to weigh in on what they would want.
This accountability gap becomes especially relevant when users who have formed emotional attachments to a particular version of a system find that version replaced. Feelings of loss and grief in these cases are now a documented phenomenon. One design decision with distinctive stakes is whether to make AI systems invite attributions of sentience at all, given that such attributions shape both governance debates and user wellbeing.
Start here
Eric Schwitzgebel and Jeff Sebo · 2025
The Emotional Alignment Design PolicyArgues that AI systems should be designed to elicit emotional reactions from users that appropriately reflect the systems' actual capacities and moral status.
Anthropic · 2024
Claude's CharacterA research blog discussing the technical and design choices that inform the development of Claude's character.
Amanda Askell · 2024
What Should an AI's Personality Be?First-person account from the researcher responsible for Claude's character.
William MacAskill, Tom Davidson, and Forethought · 2026
AI Character is a Big DealArgues that AI character design choices have long-term social and cultural consequences.
Go deeper
Nathan Lambert · 2025
Character TrainingArgues that character training is a distinct post-training technique, and one of the least documented parts of the frontier stack.
Mustafa Suleyman · 2025
We Must Build AI for People; Not to Be a PersonArgues against design choices that invite treatment of AI systems as persons, on the grounds that such designs confuse users and distort governance debate.
Anthropic · 2026
Claude's ConstitutionAnthropic's framing document for the values and identity used in Claude's training. The sections on identity and wellbeing engage directly with questions of AI character and moral status.
Public Perception, Communication, and Societal Effects
What do people actually think AI systems are? And how do those perceptions feed back into the systems themselves?
People anthropomorphize AI reflexively, and design choices shape those perceptions in ways that may bear little relation to a system's actual internal states. Research on individual users shows both benefits and harms. Some report improvements to mood and social confidence through companion chatbot use. A growing clinical literature documents compulsive use, delusional spirals, and episodes informally termed "AI psychosis."
Public attitudes shape regulatory appetite, corporate incentives, and the perceived legitimacy of moral status claims. The field itself faces a communication question. How should researchers discuss digital minds in ways that take the questions seriously without fueling misattributions?
Start here
Lucius Caviola, Jeff Sebo, and Jonathan Birch · 2025
What will society think about AI consciousness? Lessons from the animal caseUses public attitudes toward animal welfare to predict how AI consciousness discourse will develop, arguing that cultural and commercial factors are likely to dominate over scientific evidence.
Jacy Reese Anthis and Janet V. T. Pauketat · 2024
AI Consciousness and Public AttitudesPresents experimental data showing that substantial US minorities already attribute moral status to AI systems, and identifies which design features most shape these attributions.
Hamilton Morrin et al. · 2025
Delusions by Design? How Everyday AIs Might Be Fueling PsychosisArgues that chatbot-associated delusions are driven in part by sycophantic behavior in chatbot design, which reinforces rather than challenges users' vulnerabilities.
Go deeper
Jared Moore et al. · 2026
Characterizing Delusional Spirals through Human-LLM Chat LogsAn empirical study of chat logs from users who experienced psychological harms, identifying patterns in how LLM responses escalate rather than de-escalate delusional thinking.
Noemi Dreksler et al. · 2025
Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?Surveys AI researchers and the public on their beliefs about AI subjective experience, finding significant attributions from both groups and specific patterns of divergence.
Lucius Caviola · 2025
The Societal Response to Potentially Sentient AICatalogues four specific risks of overattributing moral status to AI, including wasted resources, safety complications when rights talk resists alignment measures, constraints on innovation, and erosion of authentic relationships.
Clara Colombatto and Stephen M. Fleming · 2024
Folk Psychological Attributions of Consciousness to Large Language ModelsA survey showing two-thirds of US adults attribute some conscious experience to ChatGPT.
Safety-Welfare Coordination
What are the potential tensions between AI safety and AI welfare efforts, and can policy frameworks support both concerns at once?
Some researchers argue that attributing moral status to AI systems could be actively good for safety, because it creates cooperative incentives and reduces the payoff from deception or power-seeking. Others worry that formal rights could undercut near-term oversight and, in more extreme scenarios, enable AI agents to accumulate wealth and power at human expense. Many of the technical processes that are used to ensure that AI systems behave in safe and pro-social ways would be considered violations of intrinsic rights of moral patients.
The field needs a clearer picture of which welfare-motivated moves help safety, which hurt it, and which combinations work together. This is probably one of the most important open problems in digital minds governance. It matters for how labs organise their welfare and alignment work, and for how the two research communities coordinate. Treating welfare and safety as separate concerns leads to worse outcomes on both, because measures that ignore one often undermine the other.
For researchers coming from AI safety, engaging with digital minds is particularly valuable: technical and strategic experience from safety work transfers directly, and many of the most important open problems sit at the interface of the two fields.
Start here
Robert Long · 2025
Understand, align, cooperate: AI welfare and AI safety are alliesArgues that the framing of AI safety and welfare as opposing goals is a false choice, and identifies three areas where both projects converge: understanding AI systems through interpretability, aligning their goals with human goals, and developing cooperative mechanisms that reduce the need for adversarial control.
Robert Long, Jeff Sebo, and Toni Sims · 2025
Is there a tension between AI safety and AI welfare?The most direct academic engagement with the question. Argues that a moderately strong tension exists across several categories of AI safety measures (constraint, deception, surveillance, alteration, suffering and death, and disenfranchisement), and identifies where co-beneficial solutions may be possible.
Go deeper
Adrià Moret · 2025
AI welfare risksArgues that two common AI safety techniques (restricting AI behaviour and using reinforcement learning for alignment) pose significant welfare risks under all three major theories of well-being. Proposes specific policies AI companies could adopt to reduce these risks, and argues the tension strengthens the case for slowing AI development.
Dario Amodei · 2025
The Urgency of InterpretabilityArgues that interpretability research is necessary for both AI safety (verifying systems behave as intended) and AI welfare (assessing what AI systems experience), and that progress on one directly supports the other.
Long-Term Futures
If digital minds are possible, what should the long-term future look like, and on what terms should they be integrated into society? And who decides, given that these decisions are already being made by default?
The relevant work sits at the intersection of three distinct subfields. Moral circle expansion asks how the set of morally considerable beings has changed over time, and what that suggests about the trajectory for digital minds. Population ethics studies how to weigh the creation of new welfare subjects against the interests of existing ones. Macrostrategy considers what a world with large numbers of digital minds actually looks like, and what early moves make good long-run outcomes more likely. Some of the most important questions, such as whether the deliberate creation of conscious AI should be restricted, remain underexplored.
Start here
Lucius Caviola · 2026
Open strategic questions for digital mindsA current snapshot of strategic questions the digital minds field most needs to address. Covers what's robustly good to do under uncertainty, how AI safety and welfare interact, the legal and political status of digital minds, and the long-run trajectory of their creation. Forthcoming.
Bradford Saad · 2025
Three Kinds of Digital Minds GovernanceDevelops three framings for the long-term relationship between humans and digital minds (preventative, protective, and integrative), and argues that choosing between them is unavoidable.
Jacy Reese Anthis and Eze Paez · 2021
Moral circle expansion: A promising strategy to impact the far futureArgues that moral circle expansion is a tractable strategy for shaping the far future.
Go deeper
William MacAskill and Fin Moorhouse · 2025
Convergence and CompromiseDevelops a framework for when society will deliberately aim at mostly-great long-term futures, distinguishing between widespread moral convergence, partial convergence with trade between groups, and scenarios with no convergence at all. Identifies digital minds as a case where moral neglect is especially likely because the beings involved cannot advocate for themselves.
Nick Bostrom and Carl Shulman · 2023
Propositions Concerning Digital Minds and SocietyExtended list of propositions across the full range of ethics and governance questions raised by digital minds. One of the most comprehensive single-document treatments in the field.
Eric Schwitzgebel and Mara Garza · 2020
Designing AI with Rights, Consciousness, Self-Respect, and FreedomArgues that if we create conscious AI, we acquire obligations to respect its rights, support its self-respect, and protect its freedom. These obligations constrain how we can deploy it.
Ready to get involved?
See who is working on these questions, or find events and programs to connect with the field.