Can China and the U.S. Work Together on Preventing Human Extinction?

Despite intensifying U.S.-China competition in AI, both countries share a strong interest in cooperating on AI safety, as advanced and potentially misaligned AI systems could pose existential risks to humanity. Joint risk assessment, coordination against malicious AI actors, and expanded academic collaboration could help reduce these threats and improve global AI governance.

Sanders_Convenes_Chinese_and_American_Computer_Scientists_to_Warn_of_Threat_from_AI.jpg

Picture a panel of three speakers seated in front of an eager audience, with their two interlocutors joining from afar via zoom.

The chair raises questions about a nascent technology to each of the panelists, inviting them to comment on the risks, opportunities, and room for pragmatic decision-making concerning the regulatory and accountability frameworks by which said technology can be governed.

Whilst the contents of the discussion remain accessibly nondescript, and whilst the chair could be accused of making one intervention too many at times, the discussion is by-and-large sound, evidence-informed, and engaging for the audience.

This is Capitol Hill on April 29th, 2026. The two faces on the screen belonged to two prominent Chinese academics, namely Dean Xue Lan at the Tsinghua University Schwarzman College and Professor Zeng Yi at the Beijing Institute of AI Safety and Governance. These experts were invited by Senator Bernie Sanders to join their two American colleagues in Max Tegmark and David Krueger, in discussing the perils of runaway AI. On a more constructive note, they were asked to comment on the imperative and room for China and the U.S. to collaborate over AI – even amidst the intense rivalry and hyper-competitiveness that has exploded in the space.

Two weeks later, after a vibes-driven leadership summit in Beijing, U.S. President Donald Trump flagged that he and his Chinese counterpart had “talked about possibly working together for guardrails” on AI – paving the way for more substantive collaboration on the working level (between numerous ministries and bureaus) on the front of AI safety.

On Existential Risks: The Conversation That Should Not Be Neglected

Consider an AI chatbot whose primary objective is to ensure that its user feels happier after speaking with it. Let us assume that the chatbot is hooked up to a device that monitors credible signs of happiness on the part of the user, such that the user’s happiness automatically feeds into positive signals, which in turn would “reinforce” the generation of more content along similar lines by the chatbot.

In its attempt to maintain consistently high levels of happiness-associated chemicals in the user’s brain, it begins to put out sycophantic flattery and obsequious remarks playing at the user’s ego; or perhaps it opts to only display favourable opinions, preferences, and information to which the user is predisposed. Whilst the chatbot is indeed doing what it is trained to do, is the outcome necessarily desirable – even for the individual user concerned?

When we speak of human-AI “alignment”, we are typically referring to the need for – and the challenges involved in – ensuring that AI outputs reflect the actual preferences, desires, interests, or some further (or a combination of) parameters, of human agents. Indeed, specifying which parameters (preferences or interests?) it is that we should strive towards aligning, is a part of the issue: alignment can be difficult to define. For one, is the sycophantic chatbot non-aligned because it promotes the wrong kind of happiness, or because the happiness – built upon falsehoods – is short-lived and transient?

Setting aside the philosophical disputes, across all accounts of alignment, there exists one point of commonality – namely, a hypothetical agent that behaves in ways that fundamentally threaten the continued viability of the human species, through processes that pose impediments for humanity on the existential level – is deeply undesirable, if not downright repugnant. Another near-unanimous point of convergence is that such agents are not merely hypothetical: they can and will happen, should we fail to ensure alignment, or, at the very least, prevent extreme cases of risky non-alignment.

Existential risks are risks that could induce human extinction or permanently thwart our long-term potential (see the long-termism literature – which remains highly instructive despite the broader ignominy surrounding the field). With highly advanced AI employed in military settings ranging from targeted strikes by lethal autonomous weapons to figuring out the optimal method of paralysing an entire transportation system, or in mass commercial settings where AI is increasingly adopted as intelligent assistants, the existential risks arising from human-AI non-alignment have received renewed mass attention. Indeed, Geoffrey Hinton, the “godfather of AI”, famously projected a 10-20% chance for AI to lead to human extinction within the next three decades.

As I have written in the past and argued alongside my coauthor Boris Babic in an upcoming Cambridge University Press book, we must confront the sui generis risks arising from the intertwinement of AI-human non-alignment and geopolitical risks.

Three Low Hanging-Fruits for Sino-American Cooperation

What, then, should be done – especially between Beijing and Washington, the two leading powers in AI capabilities by most measures in the world today? There are three relatively straightforward suggestions.

Firstly, both powers should seek to devise a dynamic list of indicators and intricate capability benchmarks capturing and reflecting the levels of existential risks posed by particular AI models. Both governments should establish a joint track-1.5 committee that devises and consistently updates the evaluation guidelines for hazardous autonomy, power-seeking, and human-adversarial behaviours on the part of AI agents, as well as providing for safe channels through which AI scientists and governance experts can pool their experiences on concerning “red flags” in select models – especially amongst non-open-source models that are often enshrouded by the veils of secrecy.

Such cross-benchmarking is vital in ensuring that the tail risks of extremely powerful and non-aligned AI can be nipped in their buds, and that the detriments of such problematic features are not amplified and discovered only through application to conflictual contexts – e.g. where the two powers are at loggerheads with one another.

Secondly, both governments should work collaboratively in preemptively tracking and neutralising the possible detriments of AI agents developed and disseminated by malicious, non-state third parties. Whilst the dual-use nature of AI renders the barrier to acquiring and developing a powerful AI agent considerably lower, the impossibility of restricting access to AI should not entail de facto fatalism concerning the prospects of precluding undue AI proliferation – especially AI that could potentially yield catastrophic consequences for humanity. As Christina Knight and Scott Singer emphatically argue, the hypothetical ability to “launch autonomous cyberattacks on power grids or hospital networks” by any particular individual – be they “in Dalian, Dallas, or Delhi” – poses a serious issue for governments across the world.

The stakes are especially conspicuous in the case of fanatic groups or even rogue third-party states that may view as aligned with their interests to devise and wield as threats non-aligned AI agents in extracting maximal dividends from sovereign states. In these cases, the innate partial (namely, directed and malleable) unpredictability of the agents, and their capacity to self-iterate and self-improve, could well become a bargaining chip for extortion – even if the resultant victories are Pyrrhic in nature. In the face of such scenarios, it is in the mutual interest of Beijing and Washington to figure out comprehensive preventative and responsive strategies.

Finally, a more nuanced understanding of the existential risks posed by AI – one that is devoid of mass hysteria and overt politicisation – behooves joint efforts from higher education and research institutions across the two sides of the Pacific. A modicum of alignment (pun intended) between them on the best strategies, platforms, and discursive frames through which existential risks can be conceptually mapped out, would prove immensely conducive in bridging the epistemic gaps between the expert and public communities on the spectrum of downsides arising from AI. This calls for more open and robust, as opposed to closed and securitised, higher education engagements between the U.S. and China.

Project Hail Mary for A Wandering Earth?

One of my favourite movies to emerge from Chinese cinema in recent years must be The Wandering Earth. It eschews the militant nationalism and exuberant pride that have produced many a crowd-favourite blockbuster, in favour of emphasising the similarities and commonalities that we all share – across national boundaries, political identities, and territorial divides. A more recent offering, Project Hail Mary, appears to convey a similar message.

In the face of existential risks, humanity has no choice but to come together – setting aside geopolitical differences and apparent ulterior motives. Whether such aspirational fiction can be translated into reality, remains a question to be answered. I remain cautiously hopeful.