Published on

DeepSeek V4 and the Ascend Puzzle

Authors
  • avatar
    Name
    Max Song
    Twitter

TL;DR: China's domestic AI compute base is larger than low outside estimates imply. DeepSeek V4 may be the first visible sign that near-frontier model development on Huawei Ascend is already happening.

China is much closer to domestic AI compute sufficiency than consensus thinks, and DeepSeek V4 being trained or materially developed on Ascend 950PR-class hardware is plausible enough that investors and AI researchers should take it seriously.

This is a mosaic thesis. No single clue proves it. The fit between the pieces is the point.

1. The SMIC numbers are more revealing than they look

The most useful clue in the SMIC data is not a headline wafer number. It is the category mix.

In SMIC's 2024 annual report:

  • consumer electronics revenue share jumped from 25.0% to 37.8%
  • computer & tablet fell from 26.7% to 16.6%

That matters because SMIC already breaks out phones and computer & tablet separately. Phones, laptops, and tablets are not hiding inside the consumer-electronics bucket. Once those are separated out, there are not many other obvious large-ticket advanced-node products left that can explain a swing this large. AI accelerators are the clearest candidate.

I think this is one of the most important insights in the whole thesis: SMIC's own segment reporting suggests a larger AI-accelerator contribution than many outside observers realize.

The rest of the financial picture points in the same direction. Capacity kept expanding, 12-inch mix increased, CapEx stayed elevated despite margin pressure, and inventory rose sharply. That is not what a foundry trapped in a stagnant mature-node business looks like.

2. The wafer math says low-end external estimates are too low

The wafer math does not prove an exact 2025 Ascend shipment number. It does show that the smallest outside estimates are hard to reconcile with even conservative assumptions.

Start with the basic 910C economics:

  • 910C is a dual-die package
  • about 40 raw 910C equivalents fit on a 12-inch wafer
  • effective output depends on die yield, packaging yield, and test yield
  • a reasonable effective output range is roughly 8-16 good 910C-equivalents per wafer

The basic formula is:

monthly good 910C-equivalents = advanced-node WPM × allocation to Ascend × good units per wafer

I think this is the right place to stay realistic. A 50%-75% Ascend allocation is probably too high once you remember how many other strategically important chips China still needs to make. The more plausible adjustment is not extreme Ascend allocation. It is somewhat higher total advanced-node output combined with modest Ascend share.

Example wafer scenarios

ScenarioAdvanced-node WPMAllocation to AscendGood 910C-equivalents per waferImplied monthly output
Conservative20,00015%1236,000
Base25,00020%1260,000
Higher yield25,00025%16100,000
Higher capacity30,00030%16144,000

That is the core point. You do not need heroic Ascend-allocation assumptions to get outputs that already look much larger than the smallest outside estimates. Modest allocation plus somewhat higher advanced-node wafer output is enough.

The obvious caveat is packaging and HBM. Wafer starts are not the same thing as shipped accelerators. But that caveat cuts against exactness, not direction. Directionally, the supply picture still looks much larger than the low-end narrative.

3. The token math points the same way

The top-down demand picture corroborates the bottom-up supply picture.

The token tracker used for this research estimates China at roughly 180 trillion tokens per day by February 2026. A public summary of that estimate appears in a CLS/Futu writeup on China's AI token economy.

That converts to:

  • 180,000,000,000,000 tokens/day
  • / 86,400 seconds/day
  • = ~2.08 billion tokens/sec

For hardware throughput, the cleanest public source I found is Huawei's CloudMatrix384 serving paper on arXiv. It reports DeepSeek-R1 inference on Ascend NPUs at:

  • 1,943 decode tokens/sec per NPU in a throughput-oriented setting
  • 538 decode tokens/sec per NPU in a lower-latency setting

Source: Serving Large Language Models on Huawei CloudMatrix384 (arXiv).

910C-equivalent fleet math

Assumption setPer-chip throughputUtilizationImplied chips
Throughput-oriented floor1,943 t/s100%~1.07 million
Throughput-oriented realistic1,943 t/s50%~2.14 million
Latency-oriented floor538 t/s100%~3.87 million
Latency-oriented realistic538 t/s50%~7.74 million

I would not read these literally as a physical 910C chip count. I would read them as 910C-equivalent compute math.

That is enough to make one conclusion pretty clear: China's observed token economy is very hard to reconcile with a tiny domestic compute base plus a modest amount of smuggled Nvidia.

One nuance is worth stating explicitly. DeepSeek-R1 is a very large model, but it is also an MoE model, which makes it more efficient to serve than a dense model with the same headline parameter count. So using DeepSeek-R1 class throughput as a baseline is not obviously a bad choice for this exercise.

Figure: Top-down token demand and bottom-up supply-side wafer math point in the same direction: China's effective AI compute base appears much larger than the smallest outside estimates imply.

4. Ascend 950 appears to have existed earlier than the market realizes

The next piece of the puzzle is timing.

Huawei's public roadmap places:

  • Ascend 910C in 2025 Q1
  • Ascend 950PR in 2026 Q1

The 950 generation is a real architectural jump. It adds SIMD/SIMT, FP8, MXFP8, HiF8, and MXFP4. This is not just a faster 910C. It is the first disclosed part in the family that looks meaningfully more training-oriented.

Figure: Huawei Ascend roadmap. The key point for this thesis is that FP8 first appears with the 950 generation, not with 910C. Public roadmap context is available in Huawei's Huawei Connect 2025 keynote and secondary coverage cited below.

The roadmap matters. But the really interesting clue is that engineering-sample-era 950 hardware appears to have been photographed at Huawei Connect 2025 in September.

That is more important than people seem to appreciate. A roadmap slide can be aspirational. A photographed engineering sample means the hardware was physically real early enough to matter. And in this case it does not appear to be just one chip photo. The publicly circulated Huawei Connect material appears to show 950PR, 950DT, and a blade-server engineering sample. That is a much stronger signal than a lone package shot. It suggests Huawei was already showing pieces of the actual system stack, which makes earlier cluster readiness more plausible.

Ascend 950PR engineering-sample-era hardware photographed at Huawei Connect 2025

Figure: Publicly circulated Huawei Connect 2025 images widely interpreted as Ascend 950PR, Ascend 950DT, and a blade-server engineering sample. Read together, they point not just to a future chip on a roadmap, but to a hardware stack that was already tangible by September 2025. The original discussion trace is the Tieba thread cited below; Sohu published a secondary writeup covering the same event context.

There is also a Tieba post dated November 27, 2025 claiming ByteDance was already doing PoC work on 950PR. The wording matters: it says ByteDance had been doing PoC work recently, which implies the effort was already underway by the time of the post, not that it started on November 27 itself. I would not build the whole thesis on that post. But as a corroborative clue, it fits the timeline unusually well.

Put differently: once 950 hardware is visibly real in September, partner access by late November stops sounding exotic.

5. DeepSeek's FP8 comment points to 950PR

This is where the DeepSeek-specific part tightens.

DeepSeek officially released V3.1 on August 21, 2025. Later, DeepSeek publicly said its UE8M0 FP8 format was designed for an "upcoming next-generation domestic chip" (36Kr).

That timing matters. If DeepSeek had already published a model stack using a 950-targeted FP8 format by August 21, then DeepSeek must have known about that hardware earlier. You do not invent a new numeric format for an unreleased chip family on the day you ship the model. You need advance visibility into the architecture, time to adapt the format, and time to train or at least materially develop a model around it. That pushes DeepSeek's knowledge of the 950 generation back before the public Huawei Connect roadmap event in September 2025.

Now compare that to the Huawei roadmap:

  • 910C does not support FP8
  • the first disclosed chip that does is the 950 generation

That narrows the field fast.

I think the right reading is:

  • high confidence that DeepSeek was referring to the Ascend 950 generation
  • medium-high confidence that the most likely target was 950PR

Why 950PR specifically?

  • 910C can be ruled out on format support
  • 960 and 970 are too far out
  • 950DT sits later in the roadmap than 950PR

So the strongest version of this claim is not "DeepSeek was preparing for some vague future domestic accelerator." It is "DeepSeek was likely preparing for the Ascend 950 family, most likely 950PR."

6. A V4-on-Ascend timeline is more plausible than it looks

The key question is whether the hardware timeline and model timeline can fit together. I think they do.

The supply side is no longer the obvious objection. The SMIC discussion above already shows why China can plausibly make tens of thousands of next-generation Ascend parts per month under reasonable capacity and allocation assumptions. If Huawei had 950PR-class silicon physically real by September 2025 and partner access was already plausible by late November, then the hardware side of the timeline stops looking forced.

The model side can then be read as one linear sequence:

  1. Late November to early December 2025: DeepSeek likely begins meaningful V4 work on 950PR-class hardware.
  2. Mid-January 2026: a 45-day pretraining run would finish around here if you use the reported V3 training duration as the baseline assumption.
  3. Late January to early February 2026: post-training, eval, and deployment prep.
  4. February 11, 2026: the first meaningful V4 checkpoint appears on the DeepSeek web app.
  5. February 13, 2026: a second checkpoint appears, suggesting iteration was continuing almost immediately after the first visible deployment.
  6. February 27, 2026: enthusiasts tracking the web model report another updated checkpoint with stronger benchmark behavior.
  7. March 2, 2026: a further apparent checkpoint update is reported by the same community trackers.
  8. March 4-12, 2026: China's annual "Two Sessions" political window runs in Beijing.
  9. After March 12, 2026: the publication window opens for a fuller V4 launch or public recognition event.

That middle stretch from February 11 to March 2 looks like continued post-training and checkpoint iteration, not a one-and-done release. One community-tracked benchmark chart shows exactly that pattern: DSv4lite-0211, 0213, 0227, and 0302 stepping upward over time, with the 0302 checkpoint substantially ahead of the early February versions.

Figure: Community-tracked DSv4lite checkpoints labeled 0211, 0213, 0227, and 0302, showing progressively stronger benchmark results over time. Read directionally, this looks like an active post-training / checkpoint-improvement cycle rather than a single frozen model snapshot.

We do not know if the tuning cycle is fully complete, but by this point it most likely is, or is close enough that the remaining question is publication timing rather than basic model readiness.

The political calendar matters here. China's 2026 Two Sessions run through March 12, 2026. As of March 11, 2026, that means we are still inside the political window rather than past it. If DeepSeek is going to do a broader V4 publication, the most natural window is after that event concludes, not in the middle of it.

Asterisk on the start date: I do not think December 1, 2025 should be read as a single precise start date. It is the midpoint of a reasonable range. The earliest plausible start is late November 2025, and the latest plausible start is early to mid December 2025. The point is not the exact day. The point is that once you combine the hardware timeline with a 45-day pretraining assumption, the overall sequence works.

Figure: A linear reconstruction of the hardware, training, checkpoint, and publication timeline for a possible DeepSeek V4-on-Ascend path.

7. What the thesis is actually claiming

The thesis is not:

  • we know the exact 2025 Ascend chip count
  • we know the exact DeepSeek training start date or cluster size
  • we have definitive proof that V4 was fully pretrained end-to-end on 950PR

The thesis is:

  • China's domestic AI compute base is materially larger than low-end external estimates imply
  • Huawei had 950-generation hardware and system components physically real by September 2025
  • DeepSeek likely had advance visibility into that hardware generation and was likely targeting it, most likely 950PR
  • a DeepSeek V4 model materially developed on Ascend is a real possibility, not a fringe theory

That is enough.

Conclusion

China has not matched Nvidia across the board. That is not the claim.

The claim is that the outside picture is too conservative. China looks much closer to domestic AI compute sufficiency than many analysts assume, especially on inference, and DeepSeek V4 may be the first visible sign that domestic hardware is crossing into credible near-frontier model development.

Sources