Artificial Analysis@ArtificialAnlys
Google TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysis’ Hardware Benchmarking shows NVIDIA achieving a ~5x tokens-per-dollar advantage over TPU v6e (Trillium), and a ~2x advantage over MI300X, in our key inference cost metric
In our metric for inference cost called Cost Per Million Input and Output Tokens at Reference Speed, we see NVIDIA H100 and B200 systems achieving lower overall cost than TPU v6e and MI300X. For Llama 3.3 70B running with vLLM at a Per-Query Reference Speed of 30 output tokens/s, NVIDIA H100 achieves a Cost Per Million Input and Output Tokens of $1.06, compared to MI300X at $2.24 and TPU v6e at $5.13.
This analysis relies on results of the Artificial Analysis System Load Test for system inference throughput across a range of concurrency levels, and GPU instance pricing data we collect from a range of GPU cloud providers. “Cost Per Million Input and Output Tokens at Reference Speed” uses the system throughput that the system can achieve while maintaining 30 output tokens per second per query, and divides the system’s rental cost by that throughput (scaled to a million tokens).
Full results across a range of concurrency and speed levels are available on the Artificial Analysis Hardware Benchmarking page.
Important context:
➤ We are only reporting results for TPU v6e running Llama 3.3 70B because this is the only model on our hardware page for which vLLM on TPU is officially supported. We report results for NVIDIA Hopper and Blackwell systems, and now for AMD MI300X, across all four models on our hardware page: gpt-oss-120b, Llama 4 Maverick, DeepSeek R1 and Llama 3.3 70B.
➤ These results are based on what companies can rent now in the cloud - next generation MI355X and TPU v7 accelerators are not yet widely available. We take the lowest price across a reference set of GPU cloud providers. TPU v6e is priced for on-demand at $2.70 per chip per hour, which is cheaper than our lowest tracked price for NVIDIA B200 ($5.50 per hour) but similar to NVIDIA H100 ($2.70 per hour) and AMD MI300X ($2 per hour).
➤ Google’s TPU v7 (Ironwood) is becoming generally available in the coming weeks. We would anticipate TPU v7 outperforming v6e substantially, given leaps in compute (918 TFLOPS to 4,614 TFLOPS), memory (32GB to 192GB) and memory bandwidth (1.6 TB/s to 7.4 TB/s). However, we don’t yet know what Google will charge for these instances - so the impact on implied per token costs is not yet clear.
➤ Our Cost per Million Input and Output Tokens metric can’t be directly compared to serverless API pricing. The overall implied cost per million tokens for a given deployment is affected by the per-query speed you want to aim for (driven by batch size/concurrency) and the ratio of input to output tokens.
➤ These results are all for systems with 8 accelerators - ie. 8xH100, 8xB200, 8xTPU v6e, 8xMI300X.
We’ve also recently published updated Blackwell results - more analysis of these coming soon.