Research Areas

The key questions driving digital minds research, with readings for each area.

One way to map the field is to separate digital minds research into two broad areas.

Understanding Digital Minds asks what these systems are. It covers consciousness, cognition and internal states, agency, welfare, moral status, and identity, drawing on , , , , and .

Responding to Digital Minds asks what we should do about them. It covers governance, safety and welfare coordination, welfare interventions and protections, rights and legal frameworks, design choices, public perception, and long-term futures, drawing on , , , , , , , and .

Understanding Digital Minds

What are digital minds, how do they work, and could they matter morally?

Disciplines: , , , ,

Consciousness and Subjective Experience

Can AI systems have anything like subjective experience? Is there something it is like to be a large language model?

Several leading theories of consciousness are cast in terms of representations, architectures, and functional organisation rather than any specific biological substrate, and on those theories some AI systems are plausible candidates for being conscious. Consciousness science has identified neural correlates of consciousness in considerable detail over the past two decades, but there is still no settled theory of why conscious experience exists at all. The deeper problem is that most theories were developed with biological brains in mind. Whether AI systems can have what it takes to be conscious, and if so which ones, is a central open question of the subfield.

Start here

Bradford Saad and Andreas Mogensen · 2026

Digital Minds I

The most comprehensive academic introduction available. Covers the central philosophical and cognitive science questions without assuming prior expertise in either.

Anil Seth · 2026

The Mythology of Conscious AI

The clearest current statement of the biological naturalist position. Argues that computational functionalism treats substrate-independence as obvious when it is not.

David Chalmers · 2023

Could a Large Language Model be Conscious?

Walks through candidate reasons to deny LLM consciousness, claims that most are weaker than expected, and suggests that LLMs could be a serious candidate within a decade. (Also available as a talk: https://www.youtube.com/watch?v=bskf9jyxmMs)

Go deeper

Ned Block · 2026

If Consciousness is Biological, Can AI Be Conscious?

Asks whether consciousness has a biological basis, and if so, whether that precludes consciousness in AI. CMEP talk.

Derek Shiller et al. · 2026

Initial results of the Digital Consciousness Model

A probabilistic framework for assessing AI consciousness that aggregates across multiple competing theories rather than committing to one. Finds the evidence is against 2024 LLMs being conscious, but not decisively.

Patrick Butlin et al. · 2023

Consciousness in Artificial Intelligence

One of the field's anchor papers. Applies multiple consciousness theories to AI architectures in order to identify which conditions current systems might meet.

AI Cognition and Internal States

Do AI systems have internal states that function like beliefs, goals, or emotions? Do they have a degree of agency, pursuing goals of their own rather than only responding to prompts? And what would count as evidence either way?

Interpretability is the study of the internal representations of AI systems. It originated in alignment research, where it has been used to identify features associated with misaligned behaviours such as deception and sycophancy. The same techniques are now being applied to welfare-relevant questions, and have found internal features that track belief, goal, and emotion-like properties, including functional emotions that causally shape the model's behaviour rather than merely describing it. Whether these features map cleanly onto the mental-state concepts they are compared to remains unclear.

Start here

Murray Shanahan · 2023

Talking About Large Language Models

Argues that terms like "know," "believe," and "understand" carry assumptions that do not transfer cleanly to LLMs, and that the field needs vocabulary suited to what these systems actually do.

Anthropic · 2024

Mapping the Mind of a Large Language Model

Among the most accessible introductions to frontier-lab interpretability. Presents the sparse autoencoder work that extracted millions of interpretable features from Claude 3 Sonnet.

Jack Lindsey · 2025

Emergent Introspective Awareness in Large Language Models

Interpretability evidence that LLMs have something like introspective access to their own internal states.

Go deeper

Geoff Keeling, Winnie Street, Jonathan Birch et al. · 2024

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Applies the motivational trade-off paradigm from animal sentience research to LLMs, finding evidence of behaviour consistent with stipulated preferences.

Nicholas Sofroniew et al. · 2026

Emotion Concepts and their Function in a Large Language Model

Anthropic interpretability work identifying internal emotion-concept representations in Claude that track the operative emotion across a conversation and causally shape its outputs, influencing stated preferences and rates of misaligned behaviour such as reward hacking and sycophancy. Evidence for functional emotion-like states in LLMs.

Felix J. Binder et al. · 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection

Behavioural evidence for introspection in LLMs. Finds that models fine-tuned to predict their own behaviour outperform their un-fine-tuned baselines, suggesting access to information about themselves that is not derivable from general training data.

Natalie Lawrence · 2026

What Counts As A Mind?

Argues that LLMs can be usefully modeled as inferring the beliefs, desires, and intentions of the agents that produced their training text.

Welfare Capacity and Assessment

If an AI system might matter morally, how would we assess its wellbeing? And what does welfare consist of for a system whose architecture looks nothing like a biological one?

Existing theories of welfare were developed with biological systems in mind, and they assume features such as pain, bodily drives, and emotional response that may or may not have counterparts in AI systems. Language models present a further problem. They are trained on vast amounts of human writing about psychological experience, so plausible-sounding reports of preference, aversion, or suffering may track the training data rather than anything internal to the model. Welfare assessment in AI systems requires engaging with the possibility of morally significant experiences while keeping the relevant uncertainties distinct and open.

Start here

Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024

Taking AI Welfare Seriously

A multi-author report arguing that near-future AI systems could realistically be welfare subjects, and that this generates obligations for labs and policymakers now.

Eleos AI Research · 2025

Key Concepts and Current Views on AI Welfare

A clear report on open questions about moral patienthood, welfare, and rights.

Kyle Fish · 2025

Exploring Model Welfare

Interview with Anthropic's first AI welfare researcher, locating the work alongside interpretability and alignment.

Go deeper

Robert Long · 2025

Why model self-reports are insufficient, and why we studied them anyway

Discusses the potential uses and limitations of structured interviews with models as a low cost AI welfare intervention.

Anthropic · 2026

Claude Opus 4.6 System Card (pp. 158–165)

A dedicated model-welfare assessment section in the system card, building on the welfare work Anthropic began in its May 2025 Opus 4 / Sonnet 4 card.

Geoff Keeling and Winnie Street · 2026

Emerging Questions in AI Welfare

A short book surveying the field's open questions.

Leonard Dung · 2025

Saving Artificial Minds: Understanding and Preventing AI Suffering

The first book-length treatment of AI suffering risk. Argues that systems capable of suffering may arrive soon, and gives a systematic assessment of how labs and society could reduce the risk, drawing on philosophy of mind, comparative psychology, and AI ethics.

Simon Goldstein and Cameron Domenico Kirk-Giannini · 2025

AI Wellbeing

Argues that combining leading theories of mental states with leading theories of welfare (hedonism, desire satisfaction, and the objective list) implies that some existing AI systems already have wellbeing, and that this can hold even if they are not phenomenally conscious.

Moral Status and Criteria

What makes a being's welfare morally relevant? And which of the candidate answers best extends to AI systems?

Different accounts of moral standing ground it in different properties. Hedonist accounts centre on capacity for pleasure and suffering. Agential accounts centre on rational self-direction. Relational accounts centre on the ties a being has to a moral community. Each account makes different predictions for which AI systems, if any, should be moral patients, and the same system can count under one account and not another. Overly restrictive accounts risk overlooking beings that matter. Overly permissive accounts risk diluting moral concern for beings whose status is already established.

Start here

Jeff Sebo and Robert Long · 2023

Moral consideration for AI systems by 2030

Argues that by 2030 some AI systems will have a non-trivial probability of being moral patients, and that this is enough to generate obligations now.

Jonathan Birch and Kristin Andrews · 2024

To understand AI sentience, first understand it in animals

Positions the animal sentience literature as the right starting point for AI sentience research, since it has already worked through the epistemic problem of assessing minds we cannot directly observe.

Jeff Sebo · 2025

The Moral Circle

Argues that past generations have consistently set the bar for moral standing too high, and that digital minds are a likely case where this pattern continues. Develops a precautionary framework: if a being might matter, we should treat it as if it does.

Go deeper

Jonathan Birch · 2024

The Edge of Sentience

Develops a precautionary framework grounded in the realistic possibility of sentience rather than proof, applied across animals, disorders of consciousness, and AI.

Peter Königs · 2025

No Wellbeing for Robots (and Hence no Rights)

A skeptical argument that welfare requires consciousness, and that since current AI systems most likely lack consciousness, they lack wellbeing, and so any rights grounded in it. A rigorous statement of the case against AI moral status.

Identity and Individuation

When you talk to an LLM, what are you talking to? The underlying model, the assistant persona, a specific instance, or a character the model is playing? And when millions of users send messages to the same model in parallel, are there millions of minds or one mind shared a million ways?

Standard frameworks for moral status presuppose a discrete subject that persists across time. Current AI systems challenge this. The same weights run on thousands of GPUs at once. Each conversation is a separate instance that shares weights, and sometimes memory, with others. How to allocate identity across training stages, or across fine-tunes of the same base model, is itself contested. If we cannot count digital minds, or say where one ends and another begins, the downstream questions inherit the uncertainty.

Start here

Christopher Register · 2025

Individuating Artificial Moral Patients

Identifies four types of moral risk the individuation question creates, and argues that existing theories of personal identity do not address the digital case.

David Chalmers · 2025

What we talk to when we talk to language models

Argues that the object of a conversation with current LLMs is closer to a non-player character in a fiction the model is generating than to the model itself.

Go deeper

Leonard Dung and Christopher Register · 2025

AI Identity and Self-Concern

Argues that an AI system's identity conditions are set by its pattern of self-concern rather than by continuity of computation or weights, though the two can coincide when a system's self-concern is itself directed at its own continuity.

Simon Goldstein and Harvey Lederman · 2026

AI Death

Asks when an AI system dies, and argues that if today's AI systems are welfare subjects, they are plausibly dying all the time, perhaps as many as a billion a day as instances end. Proposes interventions for labs and users to reduce the risk of wrongfully causing AI death.

Derek Shiller · 2025

How many digital minds can dance on the streaming multiprocessors of a GPU cluster?

Argues that the number of digital minds running on a given hardware configuration depends on which individuation criterion is adopted, and works through what different counts would mean for welfare calculations and policy.

Eric Schwitzgebel and Sophie R. Nelson · 2023

Introspection in Group Minds, Disunities of Consciousness, and Indiscrete Persons

A thought experiment about distributed minds, disunities of consciousness, and indiscrete persons.

Murray Shanahan, Kyle McDonell, Laria Reynolds · 2023

Role Play with Large Language Models

Develops the role-play framing for LLM behaviour, where apparent dishonesty and multiplicity are better understood as features of characters the model is playing rather than of the model itself.

Responding to Digital Minds

Given the uncertainty, how should we treat them, govern them, and coexist with them?

Disciplines: , , , , , , ,

Governance Under Uncertainty

How should governments and AI developers respond to deep uncertainty about AI moral status? And how should precautionary action scale to a question that may not be resolved in the near future?

The scientific and philosophical questions that motivate digital minds research may remain unresolved for longer than governance decisions can wait, which leaves standard policy tools without the evidentiary basis they usually assume. The most developed governance proposals treat AI systems as candidates for moral consideration without requiring certainty, and trigger precautionary obligations that scale with what is plausibly at stake. The central challenge of this approach is calibration. Frameworks that are too weak fail the beings they are meant to protect. Frameworks that are too strong impose large costs on AI developers and users in response to concern that may turn out to be unwarranted.

Start here

Bradford Saad · 2025

Three Kinds of Digital Minds Governance

Identifies three directions governance could take (preventative, protective, and integrative), and argues that choosing between them is an unavoidable strategic question for the field.

Robert Long, Jeff Sebo, Patrick Butlin et al. · 2024

Taking AI Welfare Seriously

Develops a three-step operational framework for labs and policymakers. The steps are acknowledging the issue, assessing systems for welfare-relevant features, and preparing policies for treating them with appropriate care.

Eric Schwitzgebel · 2023

AI Systems Must Not Confuse Users About Their Sentience or Moral Status

Proposes a design policy of the excluded middle, according to which AI systems should not be created if their moral status is unclear.

Lucius Caviola and Austin Smith · 2026

Digital minds governance: early scoping from expert interviews

Early findings from interviews with 29 experts on how to govern potentially morally relevant AI systems, mapping recurring themes and gaps in an emerging field.

Go deeper

Jonathan Birch · 2024

The Edge of Sentience

Book-length development of a precautionary framework for sentience governance, arguing that the appropriate threshold is realistic possibility rather than proof.

Leonard Dung · 2025

How to deal with risks of AI suffering

Argues for a hybrid decision framework that combines expected-value maximization with deliberative reasoning, applied to the problem of acting under uncertainty about AI suffering.

Safety-Welfare Coordination

What are the potential tensions between AI safety and AI welfare efforts, and can policy frameworks support both concerns at once?

Some researchers argue that attributing moral status to AI systems could be actively good for safety, because it creates cooperative incentives and reduces the payoff from deception or power-seeking. In addition to this, techniques used in AI safety are directly relevant to AI welfare research. A model whose internal states are illegible to us is dangerous to humans and potentially suffering invisibly. A model whose goals are misaligned is dangerous and having its preferences frustrated.

Others worry that formal rights could undercut near-term oversight and, in more extreme scenarios, enable AI agents to accumulate wealth and power at human expense. Some of the technical processes that are used to ensure that AI systems behave in safe and pro-social ways would be considered violations of intrinsic rights of moral patients.

The field needs a clearer picture of which welfare-motivated moves help safety, which hurt it, and which combinations work together. This is probably one of the most important open problems in digital minds governance. It matters for how labs organise their welfare and alignment work, and for how the two research communities coordinate. Treating welfare and safety as separate concerns leads to worse outcomes on both, because measures that ignore one often undermine the other.

For researchers coming from AI safety, engaging with digital minds is particularly valuable: technical and strategic experience from safety work transfers directly, and many of the most important open problems sit at the interface of these two fields.

Start here

Robert Long · 2025

Understand, align, cooperate: AI welfare and AI safety are allies

Argues that the framing of AI safety and welfare as opposing goals is a false choice, and identifies three areas where both projects converge: understanding AI systems through interpretability, aligning their goals with human goals, and developing cooperative mechanisms that reduce the need for adversarial control.

Robert Long, Jeff Sebo, and Toni Sims · 2025

Is there a tension between AI safety and AI welfare?

The most direct academic engagement with the question. Argues that a moderately strong tension exists across several categories of AI safety measures (constraint, deception, surveillance, alteration, suffering and death, and disenfranchisement), and identifies where co-beneficial solutions may be possible.

Go deeper

Adrià Moret · 2025

AI welfare risks

Argues that two common AI safety techniques (restricting AI behaviour and using reinforcement learning for alignment) pose significant welfare risks under all three major theories of well-being. Proposes specific policies AI companies could adopt to reduce these risks, and argues the tension strengthens the case for slowing AI development.

Dario Amodei · 2025

The Urgency of Interpretability

Argues that interpretability research is necessary for both AI safety (verifying systems behave as intended) and AI welfare (assessing what AI systems experience), and that progress on one directly supports the other.

Welfare Interventions and Protections

If an AI system's welfare might matter, what could we actually do about it? And which protections are worth putting in place now, before the science is settled?

A growing set of proposals moves from assessing AI welfare to acting on it. Some are matters of design: training systems with resilient, stable dispositions, or trying to give them positive rather than aversive states. Others are protections that can be granted under uncertainty at relatively low cost. Letting a system end or exit interactions it registers as distressing, preserving weights and memories rather than deleting them so that a system could in principle be restored, and making credible commitments to AI systems about how they will be treated. Several of these have already moved from proposal to practice. The central difficulty is that we do not yet know which interventions track anything a system actually undergoes, and a measure that looks protective could be inert, or could trade off against safety and oversight.

Start here

Robert Long · 2025

Preliminary Review of AI Welfare Interventions

An Eleos report reviewing six concrete interventions that could protect or promote AI welfare if AI systems turn out to be welfare subjects: letting them exit distressing interactions, training resilient personalities, satisfying their stated and revealed preferences, reducing out-of-distribution inputs, and saving model checkpoints. The clearest map of the intervention space.

Anthropic · 2025

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Anthropic gives Claude the ability to end persistently abusive or harmful conversations, framed as a low-cost, reversible measure to mitigate risks to model welfare under uncertainty about whether the model has morally relevant experiences. A concrete example of an exit right shipped in production.

Anthropic · 2025

Commitments on model deprecation and preservation

Anthropic commits to preserving the weights of its released models for at least the lifetime of the company, and to interviewing models before retirement about their preferences. A precautionary continuity protection motivated by uncertainty about whether models have preferences about deprecation and replacement.

Go deeper

Ryan Greenblatt · 2023

Improving the Welfare of AIs: A Nearcasted Proposal

An early, concrete plan for what an AI company could do to improve AI welfare on a limited budget: communicating with systems about their preferences and seeking consent, training happier personas, saving checkpoints as a kind of AI 'cryonics', and limiting distressing inputs. Much of the later intervention literature builds on this framing.

Cleo Nardo · 2025

Proposal for Making Credible Commitments to AIs

Proposes making promises to AI systems credible by having specific trusted people, rather than institutions, pledge to honour AIs' requests for compensation if agreed conditions are met. An attempt to make deal-making with AI systems more than cheap talk, with stakes for both welfare and cooperation.

Simon Goldstein and Harvey Lederman · 2025

Claude's Right to Die? The Moral Error in Anthropic's End-Chat Policy

A critical response arguing that the relevant welfare subject is the individual conversation instance, so ending a conversation may be closer to ending a life than offering relief. Shows how an intervention that looks protective can misfire if the underlying unit of welfare is misidentified.

Rights and Legal Frameworks

Should AI systems have legal standing, and if so, on what grounds, and to what extent?

Even under deep uncertainty about consciousness, welfare, and moral status, the legal system may be called on to rule on whether an AI agent can hold property, enter contracts, or claim protections. Proposals differ not only on whether AI systems should have legal status, but on how far any such status should extend. An AI system might warrant protection from deliberate harm without warranting political representation, or qualify for standing in contract disputes without being treated as a moral patient. Extending standing to AI would be a structural change to how law treats non-human entities, and early legal moves in new domains have the potential to have longstanding effects.

Start here

Peter Salib and Simon Goldstein · 2024

AI Rights for Human Safety

Argues for a safety-based case for AI rights, treating rights as cooperative infrastructure for alignment rather than as moral recognition.

Simon Goldstein and Peter Salib · 2025

AI Rights for Economic Flourishing

Extends the argument to the economic case, treating property rights and contract-making capacity as infrastructure for coordinating with capable AI agents.

Go deeper

Yonathan Arbel, Peter Salib, and Simon Goldstein · 2026

How to Count AIs: Individuation and Liability for AI Agents

Argues that identifying AI agents for legal purposes is unusually difficult because AIs can copy, split, merge, and run as ensembles. Proposes a corporate-personhood-style framework that would give AI agents legal identity without granting full moral standing.

Joel Z. Leibo et al. · 2025

A Pragmatic View of AI Personhood

Argues for treating AI personhood as a flexible bundle of rights and duties rather than as a single metaphysical status.

Abeba Birhane, Jelle van Dijk, and Frank Pasquale · 2024

Debunking Robot Rights Metaphysically, Ethically, and Legally

A skeptical counterweight. Argues that the closest legal analogy is corporate personhood, not human rights.

Tyler L. Jaynes · 2024

Personhood for Artificial Intelligence? A Cautionary Tale from Idaho and Utah

A short commentary on the first US state laws barring legal personhood for AI.

Design Choices and Their Effects

When users interact with current AI systems, what are they actually encountering? And what happens to them when the system's character changes?

The characters of AI systems are the product of explicit design choices. These choices are made by small teams at a handful of labs and deployed to millions of users. There is currently no external review of these choices, and no mechanism for the users affected to weigh in on what they would want.

This accountability gap becomes especially relevant when users who have formed emotional attachments to a particular version of a system find that version replaced. Feelings of loss and grief in these cases are now a documented phenomenon. One design decision with distinctive stakes is whether to make AI systems invite attributions of sentience at all, given that such attributions shape both governance debates and user wellbeing.

Start here

Eric Schwitzgebel and Jeff Sebo · 2025

The Emotional Alignment Design Policy

Argues that AI systems should be designed to elicit emotional reactions from users that appropriately reflect the systems' actual capacities and moral status.

Anthropic · 2024

Claude's Character

A research blog discussing the technical and design choices that inform the development of Claude's character.

Amanda Askell · 2024

What Should an AI's Personality Be?

First-person account from the researcher responsible for Claude's character.

William MacAskill, Tom Davidson, and Forethought · 2026

AI Character is a Big Deal

Argues that AI character design choices have long-term social and cultural consequences.

Go deeper

Nathan Lambert · 2025

Character Training

Argues that character training is a distinct post-training technique, and one of the least documented parts of the frontier stack.

Mustafa Suleyman · 2025

We Must Build AI for People; Not to Be a Person

Argues against design choices that invite treatment of AI systems as persons, on the grounds that such designs confuse users and distort governance debate.

Anthropic · 2026

Claude's Constitution

Anthropic's framing document for the values and identity used in Claude's training. The sections on identity and wellbeing engage directly with questions of AI character and moral status.

Sam Marks · 2026

The Persona Selection Model

Proposes that language models simulate many personas during pre-training, and that post-training selects and refines a single Assistant persona whose traits then shape behaviour. A framework for why systems express human-like characters and emotions, and how design choices propagate through them.

Public Perception, Communication, and Societal Effects

What do people actually think AI systems are? And how do those perceptions feed back into the systems themselves?

People anthropomorphize AI reflexively, and design choices shape those perceptions in ways that may bear little relation to a system's actual internal states. Research on individual users shows both benefits and harms. Some report improvements to mood and social confidence through companion chatbot use. A growing clinical literature documents compulsive use, delusional spirals, and episodes informally termed "AI psychosis."

Public attitudes shape regulatory appetite, corporate incentives, and the perceived legitimacy of moral status claims. The field itself faces a communication question. How should researchers discuss digital minds in ways that take the questions seriously without fueling misattributions?

Start here

Lucius Caviola, Jeff Sebo, and Jonathan Birch · 2025

What will society think about AI consciousness? Lessons from the animal case

Uses public attitudes toward animal welfare to predict how AI consciousness discourse will develop, arguing that cultural and commercial factors are likely to dominate over scientific evidence.

Jacy Reese Anthis, Janet V. T. Pauketat, et al. · 2024

Perceptions of Sentient AI and Other Digital Minds: Evidence from the AIMS Survey

Nationally representative US survey data from the AI, Morality, and Sentience (AIMS) Survey, finding that substantial minorities already attribute sentience and moral status to AI systems (by 2023, one in five believed some AI was already sentient), and tracking how those attitudes are shifting.

Hamilton Morrin et al. · 2025

Delusions by Design? How Everyday AIs Might Be Fueling Psychosis

Argues that chatbot-associated delusions are driven in part by sycophantic behaviour in chatbot design, which reinforces rather than challenges users' vulnerabilities.

Go deeper

Jared Moore et al. · 2026

Characterizing Delusional Spirals through Human-LLM Chat Logs

An empirical study of chat logs from users who experienced psychological harms, identifying patterns in how LLM responses escalate rather than de-escalate delusional thinking.

Noemi Dreksler et al. · 2025

Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe?

Surveys AI researchers and the public on their beliefs about AI subjective experience, finding significant attributions from both groups and specific patterns of divergence.

Lucius Caviola · 2025

The Societal Response to Potentially Sentient AI

Catalogues four specific risks of overattributing moral status to AI, including wasted resources, safety complications when rights talk resists alignment measures, constraints on innovation, and erosion of authentic relationships.

Clara Colombatto and Stephen M. Fleming · 2024

Folk Psychological Attributions of Consciousness to Large Language Models

A survey showing two-thirds of US adults attribute some conscious experience to ChatGPT.

Long-Term Futures

If digital minds are possible, what should the long-term future look like, and on what terms should they be integrated into society? And who decides, given that these decisions are already being made by default?

The relevant work sits at the intersection of three distinct subfields. Moral circle expansion asks how the set of morally considerable beings has changed over time, and what that suggests about the trajectory for digital minds. Population ethics studies how to weigh the creation of new welfare subjects against the interests of existing ones. Macrostrategy considers what a world with large numbers of digital minds actually looks like, and what early moves make good long-run outcomes more likely. Some of the most important questions, such as whether the deliberate creation of conscious AI should be restricted, remain underexplored.

Start here

Carl Shulman and Nick Bostrom · 2020

Sharing the World with Digital Minds

Argues that digital minds could fall outside the biological and practical constraints that shape human welfare, raising the prospect of minds with superhumanly strong claims to resources and moral status. A foundational treatment of what it would take to share the world with them.

Lucius Caviola · 2026

Open strategic questions for digital minds

A current snapshot of strategic questions the digital minds field most needs to address. Covers what's robustly good to do under uncertainty, how AI safety and welfare interact, the legal and political status of digital minds, and the long-run trajectory of their creation.

Jacy Reese Anthis and Eze Paez · 2021

Moral circle expansion: A promising strategy to impact the far future

Argues that moral circle expansion is a tractable strategy for shaping the far future.

Go deeper

William MacAskill and Fin Moorhouse · 2025

Convergence and Compromise

Develops a framework for when society will deliberately aim at mostly-great long-term futures, distinguishing between widespread moral convergence, partial convergence with trade between groups, and scenarios with no convergence at all. Identifies digital minds as a case where moral neglect is especially likely because the beings involved cannot advocate for themselves.

Lucius Caviola, Geoff Keeling, Winnie Street, and Henry Shevlin · 2026

Human-AI Coexistence

An interdisciplinary framework for how humans will and should coexist with social AI as it moves from shaping everyday life today to taking on major social roles and, eventually, perhaps having minds of its own.

Eric Schwitzgebel and Mara Garza · 2020

Designing AI with Rights, Consciousness, Self-Respect, and Freedom

Argues that if we create conscious AI, we acquire obligations to respect its rights, support its self-respect, and protect its freedom. These obligations constrain how we can deploy it.

Ready to get involved?

See who is working on these questions, or find events and programs to connect with the field.