Chinese AI Models Overtake US Rivals on Cost and Global Demand
Chinese AI Models Surge in Global Usage, Fueled by Cost Edge and Agent Boom
A seismic shift is occurring in the global artificial intelligence landscape, as data indicates Chinese-built large language models (LLMs) are for the first time consuming more processing power than their American counterparts on major international platforms. This surge, primarily driven by overseas developers seeking extreme cost efficiency, coincides with the explosive rise of AI agent frameworks, creating a new and voracious demand for affordable computational "tokens."
Historic Reversal in Token Consumption According to data from OpenRouter, a leading global aggregator platform for AI model APIs, a milestone was reached in late February. The total token consumption of the top ten models on the platform surpassed 28.7 trillion, with models of Chinese origin contributing over 14.69 trillion tokens. This marked the first time Chinese models' monthly token call share exceeded 50%, officially surpassing the volume attributed to U.S. models. For the week of February 16-22, Chinese models accounted for a staggering 61% of global calls on the platform, reaching 5.16 trillion tokens compared to 2.7 trillion for U.S. models. Four of the top five most-called models were Chinese: MiniMax's M2.5, Moonshot AI's Kimi K2.5, DeepSeek's V3.2, and Zhipu AI's GLM-5.
The lead proved brief, with U.S. models regaining ground in the following week, but the temporary overtaking is a significant signal. Notably, the user base driving this consumption is overwhelmingly international. U.S. developers constitute 47.17% of OpenRouter's users, while Chinese developers represent only 6.01%. Reports also suggest approximately 80% of U.S. AI startups utilize Chinese open-source models in their development processes. This indicates the push to the top is not domestic "self-celebration" but a validation from the global developer community, particularly in Silicon Valley and Europe.
The fundamental driver is a stark economic equation. The competition has evolved from a singular focus on "which model is smarter" to a multi-dimensional contest balancing capability, cost, and scalability. "When consumption grows exponentially, the price advantage per token becomes a matter of survival," the industry analysis notes.
The Economics of "Cheaper Tokens" Chinese models' systemic advantage begins with dramatically lower pricing. Research from Changjiang Securities highlights the gap: for input tokens, models like MiniMax M2.5 and Zhipu GLM-5 are priced at $0.3 per million tokens. In contrast, Anthropic's Claude Opus 4.6 charges $5, roughly 16.7 times higher. The disparity widens for output tokens. Alibaba's recently released Qwen 3.5 pushed prices even lower, to approximately $0.11 per million tokens for input, reportedly one-eighteenth the cost of Google's Gemini.
This cost advantage is underpinned by foundational factors. Lower industrial electricity costs in China, estimated at 30-40% below U.S. levels—and as much as 50-70% lower for green energy in western regions—create a physical cost moat. Furthermore, sustained restrictions on accessing the most advanced AI chips since 2024 have forced Chinese AI firms to hone exceptional engineering efficiency, extracting maximum performance from available hardware.
Widespread adoption of the Mixture of Experts (MoE) architecture is another key technical contributor. This approach allows a model with hundreds of billions of parameters to activate only a small subset of "expert" networks for a given task, drastically reducing computational and power consumption for routine operations.
The open-source ecosystem creates a powerful feedback loop. Over the past year, the global token consumption share of Chinese LLMs grew by 421%. A Stanford University report noted that from August 2024 to August 2025, Chinese developers contributed 17.1% of total downloads on Hugging Face, slightly edging out U.S. developers at 15.8%. This openness lowers the global barrier to entry and allows Chinese models to iterate rapidly based on widespread technical feedback.
"Over 50% of large model calls are completed via cheap open-source models. Chinese models are effectively supporting the majority of AI applications, to an extent that American counterparts cannot even substitute," commented Silicon Valley investor Aditya Agarwal.
The OpenClaw Catalyst and New Consumption Pathways The surge in token demand finds a perfect catalyst in the rise of AI agents, exemplified by the open-source framework OpenClaw. Hailed by Nvidia CEO Jensen Huang as potentially "the most important software release of our era," OpenClaw transforms AI from a reactive chat tool into a proactive, autonomous "digital employee" capable of handling tasks across platforms like email, calendars, and code repositories.
This shift is a "token vacuum," dramatically increasing computational demand. Huang noted that intelligent agents could increase token consumption by about 1,000 times. An agent task can easily consume hundreds of thousands to millions of tokens, making per-token API cost a primary expense for developers. Moonshot AI's launch of KimiClaw, a one-click deployment tool for OpenClaw, saw the call volume for its Kimi K2.5 model in 20 days surpass its total for all of 2024.
This new demand landscape has exposed a strategic divergence between Western and Chinese tech giants. While companies like Google and Anthropic have moved to ban or restrict accounts using subscription models for heavy, automated API calls—citing cost and safety concerns—Chinese cloud providers like Alibaba Cloud, Tencent Cloud, and ByteDance's Volcano Engine have rushed to offer fully supported OpenClaw services.
For Chinese firms, the agent boom solves critical challenges. It consumes vast amounts of inference-side computing power, helping to utilize inventory, and creates stable, high-volume token consumption scenarios beyond seasonal promotional app downloads. Companies like MiniMax have integrated OpenClaw compatibility, while Xiaomi has released a phone-optimized version.
A Three-Layered Export Strategy The global footprint of Chinese AI is no longer reliant on a single method but operates through a synergistic, three-layered structure.
The top layer remains application exports—consumer-facing apps that embed AI capabilities. ByteDance's Gauthmath, which captured 47% of the U.S. photo-based math solver market, and MiniMax's Talkie, an AI companion app popular globally, are prime examples. These apps monetize via subscriptions or ads but fundamentally run on and consume Chinese model tokens, building a vast user base.
The core commercial engine is the middle layer: API-based compute output. Through platforms like OpenRouter, overseas developers directly call Chinese model APIs, with inference occurring in Chinese data centers. This "selling water and electricity" model offers scalability and healthy margins. The strategic importance is underscored internally; at Moonshot AI, the API service team has reportedly been expanded into an independent business unit reporting directly to the president.
The foundational layer is the open-source ecosystem. By fully open-sourcing model weights and toolchains, as seen with Alibaba's Qwen and DeepSeek, Chinese firms aim to embed their technology into the global developer's default toolkit. The goal is ecosystem lock-in: once developers build on an open-source model, they are more likely to use its commercial API later. The volume of derivative models uploaded based on Chinese open-source foundations now exceeds those based on mainstream U.S. models.
"Today's Chinese AI going global is no longer a single 'application export,' but a three-tiered structure," the analysis states. "Together, they illustrate that Chinese computing power is becoming the underlying infrastructure for global AI."
Persisting Challenges and the Enterprise Market Hurdle Despite the momentum, significant challenges loom, particularly regarding geopolitical friction and enterprise adoption.
The consumer and developer markets, with short decision chains focused on price-performance, are currently ideal for Chinese models' value proposition. The enterprise market—encompassing government, finance, healthcare, and critical infrastructure—operates differently. Decisions involve long chains of compliance, data sovereignty, vendor stability audits, and brand trust.
Analysts warn that pure commercial advantages like low cost may be insufficient here. Morgan Stanley's chief China economist, Robin Xing, cautions against overstating the potential, drawing parallels to the 5G sector where Chinese equipment, despite technical and cost advantages, faced replacement in certain Western networks due to geopolitical and security concerns.
The U.S. is systematically constructing enterprise market barriers through investment screening, standard-setting, and data sovereignty rules, such as the proposed "Pax Silica" initiative. Furthermore, the ongoing uncertainty around export controls on advanced AI chips like Nvidia's H200 remains a persistent risk, potentially impacting the pace of high-end model iteration.
"While封锁 has a dual nature—spurring engineering optimization and domestic chip progress—it also poses a risk," notes a Galaxy Securities research report. "As the global model iteration cycle shortens to months, if core capability advancement slows, the cost advantage may rapidly lose its appeal in the high-end market."
For now, Chinese AI models have successfully capitalized on a perfect storm of architectural innovation, ruthless cost control, open-source strategy, and a timely shift towards agent-based computing. They have proven they can win the volume game in the global developer arena. The next, more formidable test will be whether they can translate that volumetric success into entrenched, trusted partnerships within the world's most regulated and geopolitically sensitive industries.
Comments
Post a Comment