The rapid advancements in machine learning models have generated excitement about the future of AI while also raising concerns about its implications. Following the popularity of text-to-image tools from Stability AI and OpenAI, the ability of ChatGPT to engage in intelligent conversations has become the latest obsession across various industries.
In China, where the tech community closely monitors progress in the West, entrepreneurs, researchers, and investors are seeking ways to make their mark in the generative AI space. Tech companies are developing tools based on open-source models to attract consumers and enterprise customers. Individuals are capitalizing on AI-generated content. Regulators have swiftly responded by defining the appropriate usage of text, image, and video synthesis. Additionally, US tech sanctions have raised concerns about China’s ability to keep pace with AI advancements.
As generative AI gains global attention in late 2022, let’s explore how this disruptive technology is unfolding in China.
Chinese Adaptations Viral platforms like Stable Diffusion and DALL-E 2 have thrust generative AI into the spotlight. Similarly, Chinese tech giants have captivated the public with their own equivalent products, tailored to suit local preferences and the political climate.
Baidu, renowned for its search engine and recent advancements in autonomous driving, operates ERNIE-ViLG, a 10-billion-parameter model trained on a dataset of 145 million Chinese image-text pairs. How does it compare to its American counterpart? Let’s consider the results for the prompt “kids eating shumai in New York Chinatown” provided to Stable Diffusion and the same prompt in Chinese (纽约唐人街小孩吃烧卖) for ERNIE-ViLG.
As someone familiar with eating dim sum in China and Chinatowns, I would say it’s a tie. Neither model accurately depicts shumai, a succulent, half-open yellow dumpling filled with shrimp and pork that is commonly associated with dim sum. While Stable Diffusion captures the atmosphere of a Chinatown dim sum eatery, its representation of shumai falls short (although I understand its intention). On the other hand, ERNIE-ViLG generates a type of shumai, but it resembles a variety more commonly found in eastern China rather than the Cantonese version.
This quick test demonstrates the challenges of capturing cultural nuances when the underlying datasets are inherently biased. Stable Diffusion likely has more data on the Chinese diaspora, while ERNIE-ViLG may have been trained on a broader range of shumai images that are less common outside China.
Another noteworthy Chinese tool is Tencent’s Different Dimension Me, which transforms photos of people into anime characters. However, the AI generator exhibits its own biases. Originally intended for Chinese users, it unexpectedly gained popularity in other anime-loving regions like South America. Users soon discovered that the platform failed to identify black and plus-size individuals, groups that are noticeably absent in traditional Japanese anime, leading to offensive AI-generated results.
In addition to ERNIE-ViLG, another large-scale Chinese text-to-image model is Taiyi, developed by IDEA, a research lab led by computer scientist Harry Shum, known for co-founding Microsoft Research Asia, Microsoft’s largest research branch outside the US. This open-source AI model is trained on 20 million filtered Chinese image-text pairs and contains one billion parameters.
Unlike profit-driven tech firms like Baidu, IDEA is among the few institutions backed by local governments in recent years to focus on cutting-edge technologies. This likely grants the lab more research freedom without the pressure to prioritize commercial success. Based in the tech hub of Shenzhen and supported by one of China’s wealthiest cities, IDEA is an emerging entity worth monitoring.
Chinese tech companies may face challenges when it comes to accessing the best tools for training large neural networks. In September, the US government imposed export controls on high-end AI chips, which could affect Chinese AI startups engaged in basic research. Less powerful chips may result in longer computation times and higher costs. However, an anonymous enterprise software investor from a top Chinese VC firm argued that these sanctions are actually pushing China to invest in advanced technologies in the long run.
Baidu, a company positioning itself as a leader in China’s AI field, believes that the impact of US chip sanctions on its AI business is “limited” in both the short and long term. According to Dou Shen, Baidu’s Executive Vice President and Head of AI Cloud Group, a significant portion of Baidu’s AI cloud business does not heavily rely on advanced chips. Moreover, in cases where high-end chips are required, Baidu has already stockpiled enough to support its business in the near term.
Looking ahead, Baidu has developed its own AI chip called Kunlun, which the executive confidently claims will play a role in the mid- to long-term future. By utilizing Kunlun chips in large language models, Baidu has achieved a 40% improvement in efficiency for text and image recognition tasks on its AI platform, resulting in a 20% to 30% reduction in total costs.
Only time will reveal whether Kunlun and other indigenous AI chips will provide China with a competitive advantage in the generative AI race.