CASES

Deployment cases · leaderboard

22 reproducible deployment recipes · table / scatter / bar · multi-facet filter · CSV export

⇄ Compare cases
22 / 22 shown
View:
CaseDecode tok/sPrefill tok/sTTFT p50 msTBT p50 msDate
DeepSeek R1 on 16× Ascend 910B with MindIE
ascend-910b ×16 · deepseek-r1 · mindie · bf16
85011,500280382026-04-28
DeepSeek V4 Flash on 8×H100 SXM with vLLM FP8
h100-sxm5 ×8 · deepseek-v4-flash · vllm · fp8-e4m3
4,20038,000220142026-04-28
DeepSeek V4 Pro on Huawei CloudMatrix 384 with MindIE
ascend-910c ×384 · deepseek-v4-pro · mindie · bf16
2,40038,000380322026-04-28
Llama 3.3 70B on 8× A100 SXM4 80GB with vLLM
a100-sxm4 ×8 · llama-3.3-70b · vllm · bf16
1,48018,200220322026-04-28
Llama 4 Scout on 8×H100 SXM with vLLM (public benchmark)
h100-sxm5 ×8 · llama-4-scout · vllm · bf16
1,85026,000145182026-04-28
Qwen2.5-Coder 32B on 4× L40S with vLLM (FP8)
l40s ×4 · qwen2.5-coder-32b · vllm · fp8-e4m3
5805,400480552026-04-28
DeepSeek V4 Flash with disaggregated prefill (H100) + decode (H200) via Mooncake
h200-sxm ×16 · deepseek-v4-flash · sglang · fp8-e4m3
9,600145,000320122026-04-27
GLM-5.1 on 8× H200 SXM with vLLM BF16
h200-sxm ×8 · glm-5.1 · vllm · bf16
2,40028,000280222026-04-26
Qwen3.6 Plus on 8× MI325X with SGLang FP8
mi325x ×8 · qwen3.6-plus · sglang · fp8-e4m3
3,10032,000240182026-04-26
Llama 4 Maverick on TPU Trillium (v6e) 256-chip pod
trillium ×256 · llama-4-maverick · vllm · bf16
5,80072,000180142026-04-25
Llama 4 Scout on 8× Hygon DCU K100 with vLLM
dcu-k100 ×8 · llama-4-scout · vllm · bf16
85012,500320422026-04-25
Qwen3.5 397B Reasoning on 8× MI355X with FP4
mi355x ×8 · qwen3.5-397b · vllm · fp4
4,50052,000220122026-04-24
DeepSeek V4 Flash on 16× MTT S4000 (Moore Threads KUAE)
mtt-s4000 ×16 · deepseek-v4-flash · vllm · fp16
3205,800540782026-04-23
Kimi K2.6 on 16× Cambricon MLU590 (with vLLM port)
mlu590 ×16 · kimi-k2.6 · vllm · bf16
4807,200460642026-04-22
Llama 4 Scout on 8× MI300X with vLLM BF16
mi300x ×8 · llama-4-scout · vllm · bf16
2,20032,000158162026-04-22
Qwen3.6 Plus on 8× Cambricon MLU590 with LMDeploy
mlu590 ×8 · qwen3.6-plus · lmdeploy · int8
3805,800580922026-04-22
Gemma 4 26B on 4× H100 SXM with FP8
h100-sxm5 ×4 · gemma-4 · tensorrt-llm · fp8-e4m3
6,80078,0009582026-04-21
GLM-5.1 on 8× Biren BR104 (export-control variant)
br104 ×8 · glm-5.1 · vllm · int8
2403,8007201242026-04-20
GPT-OSS on 8× Intel Gaudi 3 with vLLM
gaudi-3 ×8 · gpt-oss · vllm · fp8-e4m3
2,90035,000140182026-04-20
DeepSeek V3 on AWS Trainium 2 (64-chip Trn2 instance)
trainium-2 ×64 · deepseek-r1 · vllm · bf16
3,60048,000320242026-04-19
Gemma 4 on 4× MetaX 曦云 C500 with INT8
metax-c500 ×4 · gemma-4 · vllm · int8
5808,200420582026-04-18
DeepSeek R1 on 16× Iluvatar 天垓 100 (Iluvatar IxRT)
iluvatar-bi ×16 · deepseek-r1 · lmdeploy · int8
2203,2009801522026-04-15