This repository contains keyword blocklists and lists of other content such as URLs or images used to trigger censorship in apps used in China. The WeChat, QQMail, Apple, and Bing lists were discovered using sample testing and thus do not completely cover the censored content on these platforms. The remainder of the lists in this repository were reverse engineered from the application's software and are the exhaustive lists of keywords used to trigger censorship on these platforms.
The full details on data collection and analysis methods and results are available below.
The research below tracks daily changes to censorship in three different chat apps used in China: TOM-Skype, Sina UC, and Line. Overall, our chat app data consists of over 4,000 blocked keywords.
-
Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance
-
Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC
-
Asia Chats: Investigating Regionally-based Keyword Censorship in LINE
Data: TOM-Skype and Sina UC, LINE
The research below tracks hourly changes to censorship in three different live streaming apps in China: YY, Sina Show, and 9158; and documents the keywords censored by GuaGua, which does not include a mechanism for downloading updates to its censorship blocklists. Overall, our live-streaming data consists of over 20,000 blocked keywords.
-
Every Rose Has Its Thorn: Censorship and Surveillance on Social Video Platforms in China
-
Harmonized Histories? A year of fragmented censorship across Chinese live streaming applications
-
Censored Contagion: How Information on the Coronavirus is Managed on Chinese Social Media
Data: Original live-streaming data (2015), Updated live-streaming data (2017), Coronavirus keywords (2020)
Our research on mobile games analyzes domestic Chinese games as well as international games that have been altered to comply with Chinese regulations. Overall, we found hundreds of mobile games performing censorship, collectively censoring over 100,000 unique blocked keywords.
Data: Mobile games
This research analyzes Chinese censorship in open source projects. We extracted over 1,000 Chinese keyword blocklists from open source projects on GitHub, collectively spanning over 200,000 unique blocked keywords.
Data: Open source blocklists
Our research on WeChat censorship uses sample testing to determine what type of content, such as words, URLs, and images, can be communicated over the platform and which content is censored. We have studied what categorical content WeChat generally filters in addition to what content WeChat filters in response to specific events.
- One App, Two Systems How WeChat uses one censorship policy in China and another internationally
- We (can’t) Chat “709 Crackdown” Discussions Blocked on Weibo and WeChat
- Remembering Liu Xiaobo Analyzing censorship of the death of Liu Xiaobo on WeChat and Weibo
- Managing the Message: What you can’t say about the 19th National Communist Party Congress on WeChat
- (Can’t) Picture This: An Analysis of Image Filtering on WeChat Moments (paper)
- Censored Contagion: How Information on the Coronavirus is Managed on Chinese Social Media
- Censored Contagion II: A Timeline of Information Control on Chinese Social Media During COVID-19
Data: Keywords and URLs (November 2016), 709 Crackdown keywords and images (April 2017), Liu Xiaobo keywords and images (July 2017), 19th Party Congress keywords (November 2017), Image filtering test data (May 2018), Coronavirus keywords (March 2020)
Our research measuring Apple's filtering of product engravings uses sample testing to discover keywords that cannot be engraved in each of six different regions: United States, Canada, Japan, Taiwan, Hong Kong, and mainland China. We found that part of Apple’s mainland China political censorship bleeds into both Hong Kong and Taiwan. Much of this censorship exceeds Apple’s legal obligations in Hong Kong, and we are aware of no legal justification for the political censorship of content in Taiwan.
Six months after our initial report, in a follow study, we found that Apple eliminated their Chinese political censorship in Taiwan. However, Apple continued to perform broad, keyword-based political censorship outside of mainland China in Hong Kong, despite human rights groups’ recommendations for American companies to resist blocking content. As other tech companies do not perform similar levels of political censorship in Hong Kong, we assess possible motivations Apple may have for performing it, including appeasement of the Chinese government.
- Engrave Danger: An Analysis of Apple Engraving Censorship across Six Regions
- Engrave Condition: Apple’s Political Censorship Leaves Taiwan, Remains in Hong Kong
Data: Keyword filtering rules
On Tencent's QQMail, we discover that certain combinations of keywords being present in email messages triggers their censorship. However, the presence of other combinations, which we call extenuating combinations, deactivates the censorship of some censored keywords.
Data: Censored and extenuating keyword combinations
MY2022, an app required to be installed by attendees of the 2022 Olympics Games, includes features that allow users to report “politically sensitive” content. We found that the app also includes a censorship keyword list, which, while presently inactive, targets a variety of political topics including domestic issues such as Xinjiang and Tibet as well as references to Chinese government agencies. It is unclear whether the list is inactive purposefully or in a bid to hide the extent of China’s censorship regime from outsiders.
Data: Inactive blocklist
Testing Microsoft Bing's censorship of autosuggestions, we find Chinese political censorship of suggestions for individual's names, such as Xi Jinping, not only in China but also in North America. The findings in this report again demonstrate that an Internet platform cannot facilitate free speech for one demographic of its users while applying extensive political censorship against another demographic of its users.
Data: Censored names
Across eight China-accessible search platforms analyzed — Baidu, Baidu Zhidao, Bilibili, Microsoft Bing, Douyin, Jingdong, Sogou, and Weibo — we discovered over 60,000 unique censorship rules used to partially or totally censor search results returned on these platforms. We investigated different levels of censorship affecting each platform, which might either totally block all results or selectively allow some through, and we applied novel methods to unambiguously and exactly determine the rules triggering each of these types of censorship across all platforms. Among web search engines Microsoft Bing and Baidu, Bing’s chief competitor in China, we found that, although Baidu has more censorship rules than Bing, Bing’s political censorship rules were broader and affected more search results than Baidu. Bing on average also restricted displaying search results from a greater number of website domains. These findings call into question the ability of non-Chinese technology companies to better resist censorship demands than their Chinese counterparts and serve as a dismal forecast concerning the ability of other non-Chinese technology companies to introduce search products or other services in China without integrating at least as many restrictions on political and religious expression as their Chinese competitors.
Data: Censorship rules from testing people's names from Wikipedia, Censorship rules from testing other platforms' rules, Ongoing censorship rules testing from news articles, Web search pre-authorized domains
Users rely on online translation services to faithfully translate text to or from their native language without silently omitting sentences depending those sentences’ ideas. However, in China, Internet censorship laws stifle what can be said politically or religiously. In this work, we analyzed the extent to which popular online translation services available in China censor their translations. We analyzed four services from Chinese companies — Alibaba, Baidu, Tencent, and Youdao — and one from an American company — Microsoft’s Bing Translate. Across the services, we found over 10,000 unique, automatically applied censorship rules and that all services implement automatic censorship rules that partially or completely omit content from users’ translations. Upon triggering censorship, the services will typically omit an offending line, sentence, or the translator’s entire output. All but one service — Alibaba — performed censorship silently and therefore possibly without the user’s knowledge. Our work reveals the unfortunate reality that, even if users in China have uncensored access to news or communications platforms, what they read or write may still be subject to automated censorship if they must translate between languages.
Data: Censorship data from the "people" test set, Censorship data from the "general" test set
We analyzed the system Amazon deploys on the US “amazon.com” storefront to restrict shipments of certain products to specific regions. We found 17,050 products that Amazon restricted from being shipped to at least one world region. While many of the shipping restrictions are related to regulations involving WiFi, car seats, and other heavily regulated product categories, the most common product category restricted by Amazon in our study was books. Banned books were largely related to LGBTIQ, the occult, erotica, Christianity, and health and wellness. The regions affected by this censorship were the UAE, Saudi Arabia, and many other Middle Eastern countries as well as Brunei Darussalam, Papua New Guinea, Seychelles, and Zambia. In our test sample, Amazon censored over 1.1% of the books sold on amazon.com in at least one of these regions. We identified three major censorship blocklists which Amazon assigns to different regions. In numerous cases, the resulting censorship is either overly broad or miscategorized. Examples include the restriction of books relating to breast cancer, recipe books invoking “food porn” euphemisms, Nietzsche’s Gay Science, and “rainbow” Mentos candy. To justify why restricted products cannot be shipped, Amazon uses varying error messages such as by conveying that an item is temporarily out of stock. In misleading its customers and censoring books, Amazon is violating its public commitments to both LGBTIQ and more broadly human rights.
Data: Censorship data from Phase 1 of our experiment, Censorship data from Phase 2 of our experiment, Data backing Figures 12, 13, and 14
All data is provided under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International and available in full here and summarized here.