- Published on
DeepSeek V4 and the Ascend Puzzle
TL;DR: China's domestic AI compute base is larger than low outside estimates imply. DeepSeek V4 may be the first visible sign that near-frontier model development on Huawei Ascend is already happening.
China is much closer to domestic AI compute sufficiency than consensus thinks, and DeepSeek V4 being trained or materially developed on Ascend 950PR-class hardware is plausible enough that investors and AI researchers should take it seriously.
This is a mosaic thesis. No single clue proves it. The fit between the pieces is the point.
- 1. The SMIC numbers are more revealing than they look
- 2. The wafer math says low-end external estimates are too low
- 3. The token math points the same way
- 4. Ascend 950 appears to have existed earlier than the market realizes
- 5. DeepSeek's FP8 comment points to 950PR
- 6. A V4-on-Ascend timeline is more plausible than it looks
- 7. What the thesis is actually claiming
- Conclusion
- Sources
1. The SMIC numbers are more revealing than they look
The most useful clue in the SMIC data is not a headline wafer number. It is the category mix.
In SMIC's 2024 annual report:
- consumer electronics revenue share jumped from
25.0%to37.8% computer & tabletfell from26.7%to16.6%
That matters because SMIC already breaks out phones and computer & tablet separately. Phones, laptops, and tablets are not hiding inside the consumer-electronics bucket. Once those are separated out, there are not many other obvious large-ticket advanced-node products left that can explain a swing this large. AI accelerators are the clearest candidate.
I think this is one of the most important insights in the whole thesis: SMIC's own segment reporting suggests a larger AI-accelerator contribution than many outside observers realize.
The rest of the financial picture points in the same direction. Capacity kept expanding, 12-inch mix increased, CapEx stayed elevated despite margin pressure, and inventory rose sharply. That is not what a foundry trapped in a stagnant mature-node business looks like.
2. The wafer math says low-end external estimates are too low
The wafer math does not prove an exact 2025 Ascend shipment number. It does show that the smallest outside estimates are hard to reconcile with even conservative assumptions.
Start with the basic 910C economics:
910Cis a dual-die package- about
40raw910Cequivalents fit on a 12-inch wafer - effective output depends on die yield, packaging yield, and test yield
- a reasonable effective output range is roughly
8-16good910C-equivalents per wafer
The basic formula is:
monthly good 910C-equivalents = advanced-node WPM × allocation to Ascend × good units per wafer
I think this is the right place to stay realistic. A 50%-75% Ascend allocation is probably too high once you remember how many other strategically important chips China still needs to make. The more plausible adjustment is not extreme Ascend allocation. It is somewhat higher total advanced-node output combined with modest Ascend share.
Example wafer scenarios
| Scenario | Advanced-node WPM | Allocation to Ascend | Good 910C-equivalents per wafer | Implied monthly output |
|---|---|---|---|---|
| Conservative | 20,000 | 15% | 12 | 36,000 |
| Base | 25,000 | 20% | 12 | 60,000 |
| Higher yield | 25,000 | 25% | 16 | 100,000 |
| Higher capacity | 30,000 | 30% | 16 | 144,000 |
That is the core point. You do not need heroic Ascend-allocation assumptions to get outputs that already look much larger than the smallest outside estimates. Modest allocation plus somewhat higher advanced-node wafer output is enough.
The obvious caveat is packaging and HBM. Wafer starts are not the same thing as shipped accelerators. But that caveat cuts against exactness, not direction. Directionally, the supply picture still looks much larger than the low-end narrative.
3. The token math points the same way
The top-down demand picture corroborates the bottom-up supply picture.
The token tracker used for this research estimates China at roughly 180 trillion tokens per day by February 2026. A public summary of that estimate appears in a CLS/Futu writeup on China's AI token economy.
That converts to:
180,000,000,000,000tokens/day/ 86,400seconds/day= ~2.08 billion tokens/sec
For hardware throughput, the cleanest public source I found is Huawei's CloudMatrix384 serving paper on arXiv. It reports DeepSeek-R1 inference on Ascend NPUs at:
1,943decode tokens/sec per NPU in a throughput-oriented setting538decode tokens/sec per NPU in a lower-latency setting
Source: Serving Large Language Models on Huawei CloudMatrix384 (arXiv).
910C-equivalent fleet math
| Assumption set | Per-chip throughput | Utilization | Implied chips |
|---|---|---|---|
| Throughput-oriented floor | 1,943 t/s | 100% | ~1.07 million |
| Throughput-oriented realistic | 1,943 t/s | 50% | ~2.14 million |
| Latency-oriented floor | 538 t/s | 100% | ~3.87 million |
| Latency-oriented realistic | 538 t/s | 50% | ~7.74 million |
I would not read these literally as a physical 910C chip count. I would read them as 910C-equivalent compute math.
That is enough to make one conclusion pretty clear: China's observed token economy is very hard to reconcile with a tiny domestic compute base plus a modest amount of smuggled Nvidia.
One nuance is worth stating explicitly. DeepSeek-R1 is a very large model, but it is also an MoE model, which makes it more efficient to serve than a dense model with the same headline parameter count. So using DeepSeek-R1 class throughput as a baseline is not obviously a bad choice for this exercise.

Figure: Top-down token demand and bottom-up supply-side wafer math point in the same direction: China's effective AI compute base appears much larger than the smallest outside estimates imply.
4. Ascend 950 appears to have existed earlier than the market realizes
The next piece of the puzzle is timing.
Huawei's public roadmap places:
Ascend 910Cin2025 Q1Ascend 950PRin2026 Q1
The 950 generation is a real architectural jump. It adds SIMD/SIMT, FP8, MXFP8, HiF8, and MXFP4. This is not just a faster 910C. It is the first disclosed part in the family that looks meaningfully more training-oriented.

Figure: Huawei Ascend roadmap. The key point for this thesis is that FP8 first appears with the 950 generation, not with 910C. Public roadmap context is available in Huawei's Huawei Connect 2025 keynote and secondary coverage cited below.
The roadmap matters. But the really interesting clue is that engineering-sample-era 950 hardware appears to have been photographed at Huawei Connect 2025 in September.
That is more important than people seem to appreciate. A roadmap slide can be aspirational. A photographed engineering sample means the hardware was physically real early enough to matter. And in this case it does not appear to be just one chip photo. The publicly circulated Huawei Connect material appears to show 950PR, 950DT, and a blade-server engineering sample. That is a much stronger signal than a lone package shot. It suggests Huawei was already showing pieces of the actual system stack, which makes earlier cluster readiness more plausible.

Figure: Publicly circulated Huawei Connect 2025 images widely interpreted as Ascend 950PR, Ascend 950DT, and a blade-server engineering sample. Read together, they point not just to a future chip on a roadmap, but to a hardware stack that was already tangible by September 2025. The original discussion trace is the Tieba thread cited below; Sohu published a secondary writeup covering the same event context.
There is also a Tieba post dated November 27, 2025 claiming ByteDance was already doing PoC work on 950PR. The wording matters: it says ByteDance had been doing PoC work recently, which implies the effort was already underway by the time of the post, not that it started on November 27 itself. I would not build the whole thesis on that post. But as a corroborative clue, it fits the timeline unusually well.
Put differently: once 950 hardware is visibly real in September, partner access by late November stops sounding exotic.
5. DeepSeek's FP8 comment points to 950PR
This is where the DeepSeek-specific part tightens.
DeepSeek officially released V3.1 on August 21, 2025. Later, DeepSeek publicly said its UE8M0 FP8 format was designed for an "upcoming next-generation domestic chip" (36Kr).
That timing matters. If DeepSeek had already published a model stack using a 950-targeted FP8 format by August 21, then DeepSeek must have known about that hardware earlier. You do not invent a new numeric format for an unreleased chip family on the day you ship the model. You need advance visibility into the architecture, time to adapt the format, and time to train or at least materially develop a model around it. That pushes DeepSeek's knowledge of the 950 generation back before the public Huawei Connect roadmap event in September 2025.
Now compare that to the Huawei roadmap:
910Cdoes not supportFP8- the first disclosed chip that does is the
950generation
That narrows the field fast.
I think the right reading is:
- high confidence that DeepSeek was referring to the Ascend 950 generation
- medium-high confidence that the most likely target was 950PR
Why 950PR specifically?
910Ccan be ruled out on format support960and970are too far out950DTsits later in the roadmap than950PR
So the strongest version of this claim is not "DeepSeek was preparing for some vague future domestic accelerator." It is "DeepSeek was likely preparing for the Ascend 950 family, most likely 950PR."
6. A V4-on-Ascend timeline is more plausible than it looks
The key question is whether the hardware timeline and model timeline can fit together. I think they do.
The supply side is no longer the obvious objection. The SMIC discussion above already shows why China can plausibly make tens of thousands of next-generation Ascend parts per month under reasonable capacity and allocation assumptions. If Huawei had 950PR-class silicon physically real by September 2025 and partner access was already plausible by late November, then the hardware side of the timeline stops looking forced.
The model side can then be read as one linear sequence:
- Late November to early December 2025: DeepSeek likely begins meaningful
V4work on950PR-class hardware. - Mid-January 2026: a
45-daypretraining run would finish around here if you use the reportedV3training duration as the baseline assumption. - Late January to early February 2026: post-training, eval, and deployment prep.
- February 11, 2026: the first meaningful
V4checkpoint appears on the DeepSeek web app. - February 13, 2026: a second checkpoint appears, suggesting iteration was continuing almost immediately after the first visible deployment.
- February 27, 2026: enthusiasts tracking the web model report another updated checkpoint with stronger benchmark behavior.
- March 2, 2026: a further apparent checkpoint update is reported by the same community trackers.
- March 4-12, 2026: China's annual "Two Sessions" political window runs in Beijing.
- After March 12, 2026: the publication window opens for a fuller
V4launch or public recognition event.
That middle stretch from February 11 to March 2 looks like continued post-training and checkpoint iteration, not a one-and-done release. One community-tracked benchmark chart shows exactly that pattern: DSv4lite-0211, 0213, 0227, and 0302 stepping upward over time, with the 0302 checkpoint substantially ahead of the early February versions.

Figure: Community-tracked DSv4lite checkpoints labeled 0211, 0213, 0227, and 0302, showing progressively stronger benchmark results over time. Read directionally, this looks like an active post-training / checkpoint-improvement cycle rather than a single frozen model snapshot.
We do not know if the tuning cycle is fully complete, but by this point it most likely is, or is close enough that the remaining question is publication timing rather than basic model readiness.
The political calendar matters here. China's 2026 Two Sessions run through March 12, 2026. As of March 11, 2026, that means we are still inside the political window rather than past it. If DeepSeek is going to do a broader V4 publication, the most natural window is after that event concludes, not in the middle of it.
Asterisk on the start date: I do not think December 1, 2025 should be read as a single precise start date. It is the midpoint of a reasonable range. The earliest plausible start is late November 2025, and the latest plausible start is early to mid December 2025. The point is not the exact day. The point is that once you combine the hardware timeline with a 45-day pretraining assumption, the overall sequence works.

Figure: A linear reconstruction of the hardware, training, checkpoint, and publication timeline for a possible DeepSeek V4-on-Ascend path.
7. What the thesis is actually claiming
The thesis is not:
- we know the exact 2025 Ascend chip count
- we know the exact DeepSeek training start date or cluster size
- we have definitive proof that V4 was fully pretrained end-to-end on
950PR
The thesis is:
- China's domestic AI compute base is materially larger than low-end external estimates imply
- Huawei had
950-generation hardware and system components physically real by September 2025 - DeepSeek likely had advance visibility into that hardware generation and was likely targeting it, most likely
950PR - a DeepSeek V4 model materially developed on Ascend is a real possibility, not a fringe theory
That is enough.
Conclusion
China has not matched Nvidia across the board. That is not the claim.
The claim is that the outside picture is too conservative. China looks much closer to domestic AI compute sufficiency than many analysts assume, especially on inference, and DeepSeek V4 may be the first visible sign that domestic hardware is crossing into credible near-frontier model development.
Sources
- SMIC 2024 annual report (official PDF)
- SMIC 2023 annual report (official PDF)
- SMIC 2024 fourth-quarter results
- SMIC 2024 annual results announcement
- Reuters on China advanced-node output remaining under 20,000 wafers/month
- Reuters on Huawei 910C dual-die packaging and mass-shipment plans
- Reuters on SMIC still lacking scaled 5nm-equivalent output
- Huawei Connect 2025 keynote speech from Huawei
- Tom's Hardware on the Ascend roadmap shown at Huawei Connect 2025
- The Register on Huawei's multi-year Ascend roadmap
- DeepSeek official
V3.1release note dated August 21, 2025 - 36Kr on DeepSeek's
UE8M0 FP8clarification - Serving Large Language Models on Huawei CloudMatrix384
- CLS/Futu public writeup summarizing Chinese token-demand estimates, including
180T/dayby February 2026 - Xinhua on the fourth session of the 14th CPPCC National Committee running from March 4 to March 11, 2026
- The State Council / Xinhua on the third session of the 14th NPC opening March 5, 2026
- Tieba post on ByteDance
950PRPoC - Tieba thread on Huawei Connect
950PRsample photos - Sohu secondary writeup on Huawei Connect
950PRsample photos - Community-tracked
DeepSeek V4-litecheckpoint comparison chart, archived by the author from publicly circulated enthusiast posts