The AI Moment of Truth for Chinese Censorship

In his now-classic 2018 book, AI Superpowers, Kai-Fu Lee threw down the gauntlet in arguing that China poses a growing technological threat to the United States. When Lee gave a guest lecture to my “Next China” class at Yale in late 2019, my students were enthralled by his provocative case: America was about to lose its first-mover advantage in discovery (the expertise of AI’s algorithms) to China’s advantage in implementation (big-data-driven applications).

Alas, Lee left out a key development: the rise of large language models and generative artificial intelligence. While he did allude to a more generic form of general-purpose technology, which he traced back to the Industrial Revolution, he didn’t come close to capturing the ChatGPT frenzy that has now engulfed the AI debate. Lee’s arguments, while making vague references to “deep learning” and neural networks, hinged far more on AI’s potential to replace human-performed tasks rather than on the possibilities for an “artificial general intelligence” that is close to human thinking. This is hardly a trivial consideration when it comes to China’s future as an AI superpower.

That’s because Chinese censorship inserts a big “if” into that future. In a recent essay, Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher – whose 2021 book hinted at the potential of general-purpose AI – make a strong case for believing we are now on the cusp of a ChatGPT-enabled intellectual revolution. Not only do they address the moral and philosophical challenges posed by large language generative models; they also raise important practical questions about implementation that bear directly on the scale of the body of knowledge embedded in the language that is being processed.

It is precisely here that China’s strict censorship regime raises alarms. While there is a long and rich history of censorship in both the East and the West, the Communist Party of China’s Propaganda (or Publicity) Department stands out in its efforts to control all aspects of expression in Chinese society – newspapers, film, literature, media, and education – and steer the culture and values that shape public debate.

Unlike the West, where anything goes on the web, China’s censors insist on strict political guidelines for CPC-conforming information dissemination. Chinese netizens are unable to pull up references to the decade-long Cultural Revolution, the June 1989 tragedy in Tiananmen Square, human-rights issues in Tibet and Xinjiang, frictions with Taiwan, the Hong Kong democracy demonstrations of 2019, pushback against zero-COVID policies, and much else.

This aggressive editing of information is a major pitfall for a ChatGPT with Chinese characteristics. By wiping the historical slate clean of important events and the human experiences associated with them, China’s censorship regime has narrowed and distorted the body of information that will be used to train large language models by machine learning. It follows that China’s ability to benefit from an AI intellectual revolution will suffer as a result.

Of course, it is impossible to quantify the impact of censorship with any precision. Freedom House’s annual Freedom on the Net survey provides a qualitative assessment. For 2022, it awards China the lowest overall “Internet Freedom Score” from a 70-country sample.

This metric is derived from answers to 21 questions (and nearly 100 sub-questions) that are organized into three broad categories: obstacles to access, violations of user rights, and limits on content. The content sub-category – reflecting filtering and blocking of websites, legal restrictions on content, the vibrancy and diversity of the online information domain, and the use of digital tools for civic mobilization – is the closest approximation to measuring the impact of censorship on the scale of searchable information. China’s score on this count was two out of 35 points, compared to an average score of 20.

Looking ahead, we can expect more of the same. Already, the Chinese government has been quick to issue new draft rules on chatbots. On April 11, the Cyberspace Administration of China (CAC) decreed that generative AI content must “embody core socialist values and must not contain any content that subverts state power, advocates the overthrow of the socialist system, incites splitting the country or undermines national unity.”

This underscores a vital distinction between the pre-existing censorship regime and new efforts at AI oversight. Whereas the former uses keyword filtering to block unacceptable information, the latter (as pointed out in a recent DigiChina forum) relies on a Whac-a-Mole approach to containing the rapidly changing generative processing of such information. This implies that the harder the CAC tries to control ChatGPT content, the smaller the resulting output of chatbot-generated Chinese intelligence will be – yet another constraint on the AI intellectual revolution in China.

Unsurprisingly, the early returns on China’s generative-AI efforts have been disappointing. Baidu’s Wenxin Yiyan, or “Ernie Bot” – China’s best known first-mover large language model – was recently criticized in Wired for attempting to operate in “a firewalled Internet ruled by government censorship.” Similar disappointing results have been reported for other AI language processing models in China, including Robot, Lily, and Alibaba’s Tongyi Qianwen (roughly translated as “truth from a thousand questions”).

Moreover, a recent assessment by NewsGuard – an “internet trust tool” established and maintained by a large team of respected Western journalists – found that OpenAI’s ChatGPT-3.5 generated far more false, or “hallucinogenic,” information in Chinese than it did in English.

The literary scholar Jing Tsu’s remarkable book Kingdom of Characters: The Language Revolution that Made China Modern underscores the critical role that language has played in China’s evolution since 1900. In the end, language is nothing more than a medium of information, and in her final chapter, Tsu seizes on that point to argue that “Whoever controls information controls the world.”

In the age of AI, that conclusion raises profound questions for China. Information is the raw fuel of large language AI models. But state censorship encumbers China with small language models. This distinction could well bear critically on the battle for information control and global power.