Thaki Cloud
39 posts

Thaki Cloud
@thakicloud
Enterprise AI infrastructure provider with a full-stack, on-premises platform, for organizations that need to run AI workloads on their own infrastructure.

















Comparing Ascend and Rubin on FLOPS is misguided, imho! GENERATIONS ======== First of all they are different generations and different architecture! Also, they have different die count, process and Ascend 950 dies are smaller (smaller than even 910C). @jukan05 OPTICAL SCALE UP ======== They also have different design philosophies! Huawei’s approach is system over chips and they go for large scale up domains with optical networking that also means more space for networking IP. They use LPO that pushes DSP (lightweight) logic to ASIC. @iamfabian This also saves them trouble of buying expensive DSP IP. FYI @teortaxesTex Rubin Ultra is 4 chiplets at 3nm (for the 50 FP4 SKU you are mentioning). Ascend is 7nm and 2 chiplets only. Also, Huawei reduced chiplet size significantly from 910c to 950. This improves the yield. I grant your point about Rubin dedicating more space for FP4, which Ascend may end up doing. Here is the right way to assess, imho. ============ 1.Rubin’s 50 PF is a 2027 part vs Ascend 950 shipping Q4 2026 in my estimate. The shipping comparison is B200/B300: ~10 and ~15 PF dense FP4. So per-chip it’s ~5-7x, not 25x. Still a real gap, but a very different one…. 2.Per-chip FLOPS is the wrong unit of account. Training and inference run on systems, not chips!!! Atlas 950 SuperPoD puts 8,192 NPUs in ONE scale-up domain (16 EF FP4, 16 PB/s fabric, unified memory addressing) vs 72-144 GPUs for NVL72/NVL144. Huawei’s all-optical UnifiedBus (2.1µs latency, 200m+ reach, claimed 100x optical reliability) is what makes a rack-scale to hall-scale coherent domain transition possible at all. 3.Why does domain size matter? Bigger scale-up domains mean less reliance on slow scale-out networking for EP/TP-heavy workloads (MoE inference especially). They trade per-chip muscle for fabric, exactly the trade a networking company under chip sanctions should make! 4.FLOPS/W: directionally true and Huawei’s real weakness. But power is the binding constraint in the US, not China. China adds more grid capacity yearly than most countries have in total. Huawei is spending the resource it has (power, floor space, optics) to save the one it doesn’t (leading edge silicon below 7nm). NVIDIA wins on chips alone - that will forever be the case. The contest is at the system level, so Huawei is playing on its strength - networking (LPO in particular). LPO works as racks, boards, connectors all designed by the same vendor. Though reliability of such large scale up domain is yet to be proven. Huawei is playing an interesting game.













