PRICING / TCO

$ / M tokens 排名

基于实测案例自动计算每张卡的成本效率, 答案随案例库增长持续更新

公式 / Formula

$/M tokens = (hw_rent_per_hour + tdp_w × PUE / 1000 × kWh_price) × 1,000,000 / (decode_tok_s_per_card × 3600)

assumptions:
  hw_rent_per_hour = $2.50 USD / card / hour
  kWh_price        = $0.10 USD / kWh
  PUE              = $1.3
  TDP              = vendor-rated, per hardware
  decode_tok_s     = measured (Tier 0 case)

⚠ 这是纯推理 BoM 估算 — 不含数据中心摊销、网络、运维、license 等。实际生产 $/M tokens 通常 1.5-3× of this。用于横向对比, 不用于绝对采购报价。

每张卡最佳成本 (18 张卡有实测数据)

# 硬件 最佳 $/M 中位 最差 cases 最佳案例
1 🏆 NVIDIA H100 SXM5 80GB $0.42 $1.37 $3.11 3 详情 →
2 🏆 NVIDIA H200 SXM 141GB $1.20 $2.40 $2.40 2 详情 →
3 🏆 AMD Instinct MI355X $1.32 $1.32 $1.32 1 详情 →
4 AMD Instinct MI325X $1.89 $1.89 $1.89 1 详情 →
5 Intel Gaudi 3 $2.01 $2.01 $2.01 1 详情 →
6 AMD Instinct MI300X $2.62 $2.62 $2.62 1 详情 →
7 NVIDIA A100 SXM4 80GB $3.83 $3.83 $3.83 1 详情 →
8 NVIDIA L40S $4.88 $4.88 $4.88 1 详情 →
9 沐曦 曦云 C500 🇨🇳 $4.88 $4.88 $4.88 1 详情 →
10 海光 DCU K100 🇨🇳 $6.74 $6.74 $6.74 1 详情 →
11 AWS Trainium 2 $12.67 $12.67 $12.67 1 详情 →
12 昇腾 910B 🇨🇳 $13.34 $13.34 $13.34 1 详情 →
13 寒武纪 思元 590 🇨🇳 $14.89 $23.57 $23.57 2 详情 →
14 壁仞 BR104 🇨🇳 $23.51 $23.51 $23.51 1 详情 →
15 Google TPU Trillium (v6e) $31.05 $31.05 $31.05 1 详情 →
16 摩尔线程 MTT S4000 🇨🇳 $35.53 $35.53 $35.53 1 详情 →
17 天数智芯 天垓 100 🇨🇳 $51.29 $51.29 $51.29 1 详情 →
18 昇腾 910C 🇨🇳 $115.16 $115.16 $115.16 1 详情 →

全部案例 · 按 $/M tokens 升序 (22)

案例 硬件 ×N 模型 / 精度 decode tok/s/card TDP W $/h 总 $/M tokens
Gemma 4 26B on 4× H100 SXM with FP8 h100-sxm5 ×4 gemma-4 · fp8-e4m3 1,700 700 $2.59 $0.42 🏆
DeepSeek V4 Flash with disaggregated prefill (H100) + decode (H200) via Mooncake h200-sxm ×16 deepseek-v4-flash · fp8-e4m3 600 700 $2.59 $1.20
Qwen3.5 397B Reasoning on 8× MI355X with FP4 mi355x ×8 qwen3.5-397b · fp4 563 1400 $2.68 $1.32
DeepSeek V4 Flash on 8×H100 SXM with vLLM FP8 h100-sxm5 ×8 deepseek-v4-flash · fp8-e4m3 525 700 $2.59 $1.37
Qwen3.6 Plus on 8× MI325X with SGLang FP8 mi325x ×8 qwen3.6-plus · fp8-e4m3 388 1000 $2.63 $1.89
GPT-OSS on 8× Intel Gaudi 3 with vLLM gaudi-3 ×8 gpt-oss · fp8-e4m3 363 900 $2.62 $2.01
GLM-5.1 on 8× H200 SXM with vLLM BF16 h200-sxm ×8 glm-5.1 · bf16 300 700 $2.59 $2.40
Llama 4 Scout on 8× MI300X with vLLM BF16 mi300x ×8 llama-4-scout · bf16 275 750 $2.60 $2.62
Llama 4 Scout on 8×H100 SXM with vLLM (public benchmark) h100-sxm5 ×8 llama-4-scout · bf16 231 700 $2.59 $3.11
Llama 3.3 70B on 8× A100 SXM4 80GB with vLLM a100-sxm4 ×8 llama-3.3-70b · bf16 185 400 $2.55 $3.83
Qwen2.5-Coder 32B on 4× L40S with vLLM (FP8) l40s ×4 qwen2.5-coder-32b · fp8-e4m3 145 350 $2.55 $4.88
Gemma 4 on 4× MetaX 曦云 C500 with INT8 metax-c500 ×4 gemma-4 · int8 145 350 $2.55 $4.88
Llama 4 Scout on 8× Hygon DCU K100 with vLLM dcu-k100 ×8 llama-4-scout · bf16 106 600 $2.58 $6.74
DeepSeek V3 on AWS Trainium 2 (64-chip Trn2 instance) trainium-2 ×64 deepseek-r1 · bf16 56 500 $2.56 $12.67
DeepSeek R1 on 16× Ascend 910B with MindIE ascend-910b ×16 deepseek-r1 · bf16 53 400 $2.55 $13.34
Qwen3.6 Plus on 8× Cambricon MLU590 with LMDeploy mlu590 ×8 qwen3.6-plus · int8 48 350 $2.55 $14.89
GLM-5.1 on 8× Biren BR104 (export-control variant) br104 ×8 glm-5.1 · int8 30 300 $2.54 $23.51
Kimi K2.6 on 16× Cambricon MLU590 (with vLLM port) mlu590 ×16 kimi-k2.6 · bf16 30 350 $2.55 $23.57
Llama 4 Maverick on TPU Trillium (v6e) 256-chip pod trillium ×256 llama-4-maverick · bf16 23 250 $2.53 $31.05
DeepSeek V4 Flash on 16× MTT S4000 (Moore Threads KUAE) mtt-s4000 ×16 deepseek-v4-flash · fp16 20 450 $2.56 $35.53
DeepSeek R1 on 16× Iluvatar 天垓 100 (Iluvatar IxRT) iluvatar-bi ×16 deepseek-r1 · int8 14 300 $2.54 $51.29
DeepSeek V4 Pro on Huawei CloudMatrix 384 with MindIE ascend-910c ×384 deepseek-v4-pro · bf16 6 700 $2.59 $115.16
想自己调整假设? 打开计算器 TCO 面板里 $/卡/小时 和 TDP 都可以改, 也支持自定义模型/硬件/并行配置。