CASES
部署案例 · 排行榜
22 条实测部署 recipe · 表格 / 散点图 / 柱状图 · 多维筛选 · CSV 导出
22 / 22 显示
视图:
| 案例 | Decode tok/s | Prefill tok/s | TTFT p50 ms | TBT p50 ms | 日期 ↓ |
|---|---|---|---|---|---|
| DeepSeek R1 on 16× Ascend 910B with MindIE ascend-910b ×16 · deepseek-r1 · mindie · bf16 | 850 | 11,500 | 280 | 38 | 2026-04-28 |
| DeepSeek V4 Flash on 8×H100 SXM with vLLM FP8 h100-sxm5 ×8 · deepseek-v4-flash · vllm · fp8-e4m3 | 4,200 | 38,000 | 220 | 14 | 2026-04-28 |
| DeepSeek V4 Pro on Huawei CloudMatrix 384 with MindIE ascend-910c ×384 · deepseek-v4-pro · mindie · bf16 | 2,400 | 38,000 | 380 | 32 | 2026-04-28 |
| Llama 3.3 70B on 8× A100 SXM4 80GB with vLLM a100-sxm4 ×8 · llama-3.3-70b · vllm · bf16 | 1,480 | 18,200 | 220 | 32 | 2026-04-28 |
| Llama 4 Scout on 8×H100 SXM with vLLM (public benchmark) h100-sxm5 ×8 · llama-4-scout · vllm · bf16 | 1,850 | 26,000 | 145 | 18 | 2026-04-28 |
| Qwen2.5-Coder 32B on 4× L40S with vLLM (FP8) l40s ×4 · qwen2.5-coder-32b · vllm · fp8-e4m3 | 580 | 5,400 | 480 | 55 | 2026-04-28 |
| DeepSeek V4 Flash with disaggregated prefill (H100) + decode (H200) via Mooncake h200-sxm ×16 · deepseek-v4-flash · sglang · fp8-e4m3 | 9,600 | 145,000 | 320 | 12 | 2026-04-27 |
| GLM-5.1 on 8× H200 SXM with vLLM BF16 h200-sxm ×8 · glm-5.1 · vllm · bf16 | 2,400 | 28,000 | 280 | 22 | 2026-04-26 |
| Qwen3.6 Plus on 8× MI325X with SGLang FP8 mi325x ×8 · qwen3.6-plus · sglang · fp8-e4m3 | 3,100 | 32,000 | 240 | 18 | 2026-04-26 |
| Llama 4 Maverick on TPU Trillium (v6e) 256-chip pod trillium ×256 · llama-4-maverick · vllm · bf16 | 5,800 | 72,000 | 180 | 14 | 2026-04-25 |
| Llama 4 Scout on 8× Hygon DCU K100 with vLLM dcu-k100 ×8 · llama-4-scout · vllm · bf16 | 850 | 12,500 | 320 | 42 | 2026-04-25 |
| Qwen3.5 397B Reasoning on 8× MI355X with FP4 mi355x ×8 · qwen3.5-397b · vllm · fp4 | 4,500 | 52,000 | 220 | 12 | 2026-04-24 |
| DeepSeek V4 Flash on 16× MTT S4000 (Moore Threads KUAE) mtt-s4000 ×16 · deepseek-v4-flash · vllm · fp16 | 320 | 5,800 | 540 | 78 | 2026-04-23 |
| Kimi K2.6 on 16× Cambricon MLU590 (with vLLM port) mlu590 ×16 · kimi-k2.6 · vllm · bf16 | 480 | 7,200 | 460 | 64 | 2026-04-22 |
| Llama 4 Scout on 8× MI300X with vLLM BF16 mi300x ×8 · llama-4-scout · vllm · bf16 | 2,200 | 32,000 | 158 | 16 | 2026-04-22 |
| Qwen3.6 Plus on 8× Cambricon MLU590 with LMDeploy mlu590 ×8 · qwen3.6-plus · lmdeploy · int8 | 380 | 5,800 | 580 | 92 | 2026-04-22 |
| Gemma 4 26B on 4× H100 SXM with FP8 h100-sxm5 ×4 · gemma-4 · tensorrt-llm · fp8-e4m3 | 6,800 | 78,000 | 95 | 8 | 2026-04-21 |
| GLM-5.1 on 8× Biren BR104 (export-control variant) br104 ×8 · glm-5.1 · vllm · int8 | 240 | 3,800 | 720 | 124 | 2026-04-20 |
| GPT-OSS on 8× Intel Gaudi 3 with vLLM gaudi-3 ×8 · gpt-oss · vllm · fp8-e4m3 | 2,900 | 35,000 | 140 | 18 | 2026-04-20 |
| DeepSeek V3 on AWS Trainium 2 (64-chip Trn2 instance) trainium-2 ×64 · deepseek-r1 · vllm · bf16 | 3,600 | 48,000 | 320 | 24 | 2026-04-19 |
| Gemma 4 on 4× MetaX 曦云 C500 with INT8 metax-c500 ×4 · gemma-4 · vllm · int8 | 580 | 8,200 | 420 | 58 | 2026-04-18 |
| DeepSeek R1 on 16× Iluvatar 天垓 100 (Iluvatar IxRT) iluvatar-bi ×16 · deepseek-r1 · lmdeploy · int8 | 220 | 3,200 | 980 | 152 | 2026-04-15 |