Distillation Debate Rages as Chinas AI Ambitions Face Scrutiny

Allegations of Model "Distillation" Ignite Debate on Chinese AI's Path to Competitiveness

A recent accusation by U.S. artificial intelligence firm Anthropic against several prominent Chinese AI labs has sparked a fierce industry debate, transcending simple allegations of intellectual property infringement to touch upon core questions about innovation, competition, and the technological trajectory of China's rapidly advancing AI sector.

In a detailed blog post, Anthropic alleged that three Chinese companies—DeepSeek, Moonshot AI (Yue Zhi An Mian), and MiniMax—operated a sophisticated, distributed network of tens of thousands of fake accounts, dubbed a "hydra cluster," to systematically extract data from its Claude models. The company claimed this activity, described as "distillation," involved over 16 million conversations generated in violation of its terms of service and regional access restrictions. Anthropic warned that models trained on such "illegally distilled" data could lack the original's safety guardrails, posing potential risks if deployed for malicious purposes.

The term "distillation" in AI refers to a technique where a smaller or weaker model is trained to mimic the outputs of a larger, more powerful one, potentially accelerating development. While common in global AI research, Anthropic framed the scale and method of these alleged operations as uniquely aggressive and problematic.

A Nuanced View from the Research Community

The narrative, however, quickly encountered pushback from technical experts who urged a more measured analysis. Among them is Nathan Lambert, a scientist at the Allen Institute for AI and a respected voice in reinforcement learning from human feedback (RLHF). In a response analysis, Lambert argued that the situation is "both less severe and more complicated than it appears."

Lambert emphasized the need to disaggregate Anthropic's blanket accusation. According to the data presented, the scale of alleged activity varied dramatically: DeepSeek was linked to only 150,000 interactions, which Lambert suggested was a negligible volume likely representing a small internal experiment rather than a core training strategy. In contrast, MiniMax was accused of approximately 13 million interactions, with Moonshot AI involved in about 3.4 million. The combined output for the latter two is estimated at 150 to 400 billion tokens, representing a significant computational cost.

More importantly, Lambert challenged the underlying assumption that "distillation" alone could explain Chinese models' recent advances. "Chinese AI companies have very good infrastructure, have produced a lot of innovation, and are working on hard technical problems," he noted, contending their results are not simply achieved by "taking shortcuts."

The Technical Ceiling of Distillation

Lambert's core argument delves into the technical limitations of the distillation process. While effective for transferring certain capabilities, he distinguishes it fundamentally from reinforcement learning (RL), a cornerstone of training state-of-the-art models.

"Distillation is imitation, learning the output of a strong model, copying the 'shape of its answers,'" Lambert explained. "RL is exploration; the model must reason, generate, iterate on errors extensively, and refine its abilities through trial and error."

He posits that the most powerful capabilities—particularly in complex reasoning, tool use, and agentic behavior—are not merely copied but emerge from a model's own exploratory learning process. Distillation might serve as a useful "warm-up," but the ceiling for a model trained primarily on another's outputs is inherently lower than one that has undergone intensive, self-directed RL. Furthermore, subtle differences in data distribution between models can introduce noise and limitations, making perfect replication via output copying technically fraught.

This perspective suggests that even if the alleged distillation occurred at scale, it would be insufficient to produce the high-performance models these Chinese companies have recently demonstrated. The debate, therefore, shifts from a simple story of copying to a more complex discussion about the blend of techniques used in a fiercely competitive field.

Commercial Breakouts Amidst the Controversy

The distillation controversy unfolds against a backdrop of remarkable commercial and technical momentum for China's leading AI firms, particularly those named in Anthropic's blog. This momentum presents a stark contrast to the narrative of dependency on external models.

In early 2026, the Hong Kong stock market witnessed a spectacular rally in shares of newly listed AI companies. MiniMax and Zhipu AI (智谱), both within months of their IPOs, saw their market capitalizations briefly surpass HK$300 billion (approximately US$38 billion), eclipsing established Chinese internet giants. This valuation surge, while raising eyebrows over traditional metrics, was fueled by investor excitement over tangible technical progress.

Zhipu AI's release of its GLM-5 model marked a significant milestone. Benchmark results showed it competing closely with top global proprietary models in coding, reasoning, and agentic capabilities. Crucially, the model incorporated advanced efficiency techniques like deep sparse attention mechanisms, allowing it to activate only a fraction of its 744 billion parameters during inference, dramatically lowering operational costs. Following strong user adoption, Zhipu AI raised prices for its GLM Coding Plan by over 30%.

Similarly, MiniMax found rapid traction with its lightweight, efficient M2.5 model, which quickly rose to the top of token usage charts on platforms like OpenRouter, driven by demand for AI agent applications.

This shift—from competing purely on parameter count or price to delivering models that users are willing to pay for—signals a maturing phase in China's AI commercialization journey.

The Major Tech Platforms' Aggressive Push

Simultaneously, China's internet giants are engaged in an expensive battle to integrate AI into their vast ecosystems and acquire users. During the 2026 Lunar New Year holiday, companies like ByteDance (with its Doubao AI), Alibaba (with Qwen), and Tencent canceled employee holidays to manage unprecedented AI demand fueled by massive incentive campaigns.

ByteDance reported 1.9 billion AI interactions on its Doubao platform on New Year's Eve alone. Alibaba invested an estimated RMB 3 billion to subsidize users making purchases via Qwen on its e-commerce platforms, reaching nearly 200 million users. Tencent leveraged红包 (red packet) campaigns to drive monthly active users for its AI services to new heights.

For these giants, AI is not merely a new product line but a critical tool to reinforce and defend their core businesses—from e-commerce and payments to gaming and social media. Alibaba, for instance, has deeply integrated Qwen across Taobao, Alipay, and Amap, with rapid iteration cycles to improve usability.

The Unsustainable Arithmetic of Scale

Beneath the surface of technical breakthroughs and market euphoria lies a stark and persistent financial reality: crippling losses driven overwhelmingly by soaring compute costs. The path to cutting-edge AI is proving astronomically expensive, even as efficiency improves.

Financial disclosures reveal a daunting picture. From 2022 to the first half of 2025, Zhipu AI accumulated losses of RMB 6.24 billion. MiniMax reported losses of US$1.32 billion over a similar period. The primary culprit is not headcount but compute expenditure.

Analysis indicates that for both companies, over 50% of total spending, and as much as 70-80% of research and development costs, is consumed by computing power for model training and inference. For MiniMax in 2024, cloud computing costs related to training and inference reached 545% of its revenue. For Zhipu AI the same year, computing and related service fees were 506% of revenue. This means for every dollar earned, over five dollars is spent on compute.

Paradoxically, this is happening while the cost per individual AI inference is plummeting. Industry averages have fallen from around $20 per million tokens in late 2022 to under $0.10 by late 2024. However, the total compute bill continues to swell due to the exponential growth in model complexity, the rise of multi-modal systems, and, crucially, the explosion in usage from applications like AI agents. As models advance to the next generation, training costs typically increase by 3 to 5 times.

This creates a formidable challenge: commercial success and user growth directly intensify the financial strain, pushing profitability further into the future and raising questions about long-term sustainability without continuous access to capital.

Regulatory and Ethical Shadows

The industry's breakneck growth is also accompanied by growing pains in regulation and ethics. The release of powerful video generation models like Seedance2.0 has already attracted legal challenges from copyright holders, leading to the removal of controversial features. The Anthropic allegations, whether fully substantiated or not, highlight the emerging global tensions around data usage, model provenance, and safety standards in a fragmented AI landscape. These issues represent another layer of risk for companies navigating both technological and commercial frontiers.

The current moment in Chinese AI is thus one of striking contrasts: celebrated technical achievements and soaring market valuations exist alongside serious allegations, severe financial pressures, and unresolved ethical questions. The "distillation" controversy, rather than defining the sector, has become a catalyst for a deeper examination of its true drivers. Evidence suggests that while competitive practices are intense, the sector's progress is increasingly built on substantial domestic R&D investments, architectural innovation for efficiency, and a costly, direct battle for user adoption and commercial integration. The journey beyond what some are calling "AI's childhood" is proving to be as financially perilous as it is technologically ambitious.

Comments

Popular posts from this blog

Moonshot AI Unveils Kimi K2.5: Open-Source Multimodal Models Enter the Agent Swarm Era

MiniMax Voice Design: A Game-Changer in Voice Synthesis

Huawei's "CodeFlying" AI Agent Platform Marks Industrial-Scale Natural Language Programming Era