The Chip War Inside China: Why Nvidia and Huawei Are Both Winning, Just at Different Things

An editorial illustration of a Chinese data center split down the middle by green and red light, symbolizing Nvidia's AI training hardware on the left and Huawei's AI inference hardware on the right, functionally interlocking

The U.S. tried to lock China out of advanced AI chips. Instead, it accidentally forced China to build its own and now both sides are stuck in a hardware stalemate neither fully controls.

When Jensen Huang, CEO of Nvidia, stood before cameras recently and said the demand in China is “so incredible,” he wasn’t being diplomatic. He was being honest about a market his company is slowly losing grip on not because its chips aren’t good enough, but because the rules of the game keep changing, and Washington is no longer the only one writing them.


Two Chips, Two Very Different Jobs

To understand what’s actually happening in China’s AI hardware market, you need to understand one thing first: not all AI computing is the same.

There are two distinct phases in building and deploying an AI model. The first is training the enormously expensive, compute-heavy process of teaching a model from scratch using billions of data points. The second is inference, the moment you actually use the model, when it answers your question, generates your image, or summarizes your document.

These two tasks have completely different hardware requirements. And right now, Nvidia dominates training while Huawei is quietly winning inference, which is where most of the real-world commercial action happens.


What the Hardware Actually Looks Like Side by Side

Huawei’s flagship domestic AI chip, the Ascend 910C, and Nvidia’s China available lineup the downgraded H20 and the recently licensed H200 sit in very different places on the performance spectrum.

The Ascend 910C carries a massive 128GB of HBM3 memory, making it extraordinarily capable at holding large language models in active memory for inference. In raw compute, it delivers roughly 800 TFLOPS of FP16 performance far ahead of Nvidia’s deliberately crippled H20, which was stripped down to just 296 TFLOPS to satisfy U.S. export caps.

But the H200 now licensed for sale to select Chinese entities is a different story. It carries 141GB of even faster HBM3e memory at 4.8 TB/s bandwidth and nearly 989 TFLOPS of compute. On paper, it’s almost double the performance of Huawei’s best chip.

Where Huawei falls furthest behind isn’t compute, it’s interconnect speed. Nvidia’s NVLink architecture lets chips talk to each other at 900 GB/s, meaning thousands of GPUs can function as a single unified supercomputer with almost no bottleneck. Huawei’s chip to chip communication is significantly slower. When you scale to 10,000 Ascend chips in a training cluster, data congestion becomes a real penalty slowing the entire system during the communication heavy phases of frontier model training.

That single gap explains almost everything about how Chinese companies are currently building their AI infrastructure.


How China’s Tech Giants Are Working Around It

Companies like ByteDance, Alibaba, Tencent, and DeepSeek don’t have the luxury of picking one supplier. U.S. export controls limit access to Nvidia’s best chips. Huawei can’t produce Ascend processors fast enough to meet the country’s massive demand. So they’ve done what engineers do when the tools don’t fit: they redesigned the workflow.

The most common approach is a clean functional split. Nvidia clusters handle training. The chips are kept tightly together in isolated zones where NVLink’s speed advantage minimizes bottlenecks during the massive mathematical workloads of frontier model development. Once a model is trained and finalized, it gets handed off to massive Huawei Ascend inference fleets where large memory capacity matters far more than chip to chip communication speed, and where Huawei’s hardware is genuinely competitive.

Sophisticated orchestration software sits above all of this, routing jobs to the right hardware automatically. To the developer writing the code, the difference between an Nvidia GPU and a Huawei Ascend chip is increasingly invisible.


DeepSeek Figured Out How to Make the Hardware Interchangeable

Perhaps the most significant development in this space isn’t hardware at all, it’s software strategy. DeepSeek’s rise to global prominence wasn’t just about building a capable model on a budget. It was about building a model specifically designed to run across incompatible hardware.

The key technique is Mixture of Experts (MoE) architecture. Rather than activating the full model for every query, MoE models only switch on the relevant specialized portions roughly 3% of total parameters for any given task. This slashes the active memory and compute burden per query dramatically.

The result: a model that can be compiled and optimized to run on both Nvidia CUDA and Huawei CANN environments with relatively minimal friction. DeepSeek even built custom developer tooling like TileLang specifically to bridge the two ecosystems. By writing code that is silicon-agnostic, Chinese AI labs are doing something the rest of the industry hasn’t had to: breaking a dependency that most Western developers never even noticed they had.


Jensen Huang Is Optimistic But the Market Has Already Changed

Huang has been candid about what the H200 licensing represents. He framed it as the Trump administration wanting America to lead the AI revolution and letting market economics do the work. His read: demand in China is so large that the market will eventually open, and both Xi Jinping and Premier Li Qiang have signaled China wants to remain a commercially open environment.

But the structural reality underneath that optimism is more complicated.

Nvidia’s market share in China has already fallen from a dominant 95% to under 60%, and the trajectory isn’t reversing. The company estimates China represents a potential $50 billion opportunity growing toward $67 billion by 2030. That’s not a market you can afford to slowly lose.

The newer friction isn’t coming from Washington. It’s coming from Beijing. China’s government is actively steering its domestic hyperscalers away from American silicon not through outright bans, but through regulatory pressure, procurement guidelines favoring domestic suppliers, and the broader political priority of semiconductor self-sufficiency. Chinese customs recently blocked specific Nvidia gaming and AI-crossover chips, illustrating that clearing U.S. export hurdles no longer guarantees smooth entry into the Chinese market.

Satisfying Washington no longer means access to China. That’s a new and structurally uncomfortable reality for Nvidia.


The Unintended Consequence Washington Didn’t Plan For

U.S. export controls were designed to slow China’s AI development. The actual outcome has been more complicated than that.

By restricting access to Nvidia’s best chips, Washington forced Chinese engineers to mature their own hardware and software ecosystems at a pace they almost certainly wouldn’t have managed otherwise. Huawei’s Ascend line is now commercially viable at scale. Huawei’s CANN software framework, once a rough imitation of Nvidia’s CUDA has been heavily optimized to support PyTorch and TensorFlow natively, with conversion tools that translate CUDA code into CANN-compatible code with manageable effort.

Domestic GPU manufacturers like Cambricon are experiencing rapid revenue acceleration. The competitive pressure that was supposed to keep China behind has instead become the forcing function for China’s semiconductor independence.

Nvidia’s real moat has never been just the chips, it’s been CUDA, the proprietary software ecosystem that the entire global AI development community has been trained on for over a decade. By pushing Chinese tech firms onto domestic hardware, the U.S. has inadvertently funded the maturation of a parallel software stack. Once that stack reaches stability at hyperscale, the barriers to fully replacing American hardware in Chinese data centers drop considerably.


Where This Is All Heading

The global AI supply chain is formally fracturing into two parallel tracks and the split is likely permanent regardless of how trade policy evolves.

For frontier model training, Chinese firms will continue chasing U.S. chips through licensed channels, gray markets, and workarounds because Nvidia’s interconnect advantage at scale is genuinely hard to replicate. The H200’s availability helps Nvidia here, but the window for locking in that dependency is narrowing.

For inference and deployment, the part of AI that actually touches hundreds of millions of daily users domestic Chinese chips are already “good enough,” and getting better. That’s where commercial AI revenue actually lives: in the queries answered, the products powered, the services running at scale. Huawei is capturing that market right now, and no export license changes that.

Jensen Huang is right that demand in China is extraordinary. What he’s navigating is the possibility that by the time the market fully opens, the customers will have already built the infrastructure without him.



More posts

TRENDING posts