G Google Last verified

Google TPU v5p

PROPRIETARY In production Released 2023 tpu-v5
BF16
TFLOP/s
459 厂商声称
FP8
TFLOP/s
unsupported
FP4
TFLOP/s
unsupported
Memory
GB
95 厂商声称
Mem BW
GB/s
2765 厂商声称
TDP
W
700 厂商声称

Full specs

Compute

FP4 TFLOPS
unsupported
FP8 TFLOPS
unsupported
BF16 TFLOPS
459
FP16 TFLOPS
459
INT8 TOPS
918

Memory

Capacity
95 GB
Bandwidth
2765 GB/s
Type
HBM2e

Die architecture 🟢 vendor floorplan

XPU count
4
HBM stacks
4
Process
5 nm

Scale-Up (intra-node)

Protocol
ICI
Per-link BW
4800 GB/s
World size
8960
Topology
3d-torus
Switch

Scale-Out (inter-node)

Per-card NIC
100 Gbps
Protocol
DCN
NIC

Topology

拓扑结构 · Topology
8960 卡 scale-up domain
芯片内部 / Die-level architecture
HBM HBM HBM HBM Google TPU v5p L2 / shared cache · NoC L1$ / register file (per XPU) 4 XPUs · darker block = tensor / matrix engine 459 TFLOPS BF16 · 95 GB HBM2e @ 2.8 TB/s · 700 W TDP

🟢 vendor floorplan 4 XPUs · 4× HBM · 5 nm


集群拓扑 / Cluster topology · ICI @ 4800 GB/s
Spine (ICI fabric) Leaf switches N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N27 N28 N29 N30 N31 N32 Super-pod (rack-scale) · 8960 cards in single scale-up domain · 4800 GB/s/link · 2-tier Clos fabric
Scale-Up · 域内
ICI
4800 GB/s · 拓扑: 3d-torus
world_size = 8960
Scale-Out · 跨域
DCN
100 Gbps/卡 NIC

Which models can it run?

Quick estimates · decode tok/s/card 上界

TP=8 · BF16 · batch=16 · prefill=1024 · decode=256 · 已应用 efficiency 校准

在计算器中调整 →
模型 参数 (active) Decode tok/s/card 瓶颈
DeepSeek V4 Pro
deepseek
49B 显存不足
DeepSeek V4 Flash
deepseek
13B 79 内存带宽
Mistral Small 4
mistral
22B 36 内存带宽
GLM-5 Reasoning
zhipu
32B 30 内存带宽
GLM-5.1
zhipu
32B 显存不足
Qwen3.6 Plus
alibaba
35B 显存不足
Kimi K2.6
moonshot
32B 显存不足
MiniMax M2.7
minimax
46B 显存不足

Operator-level fit · per-model bottleneck + upper bound

算子级 fit · operator-level fit (per-token roofline)

基于每个模型 operator_decomposition + 本卡 BF16 459 TFLOPS / 2,765 GB/s 计算 · ridge point ≈ 166 FLOPs/byte

上界 = min(计算屋顶, 内存带宽屋顶) · efficiency 未应用
模型 domain 主导算子 AI · F/B 瓶颈 tok/s 上界
DeepSeek V4 Pro llm matmul 245.5 🔥 计算 76k
GraphCast scientific graph-message-passing 0.9 💾 内存带宽 5101
AlphaFold 3 scientific pair-bias-attention 2.3 💾 内存带宽 1533
GPT-OSS llm matmul 0.7 💾 内存带宽 224
Gemma 4 26B llm matmul 0.7 💾 内存带宽 166
DeepSeek V4 Flash llm matmul 0.8 💾 内存带宽 157
Mistral Small 4 llm matmul 0.6 💾 内存带宽 72
Llama 4 Maverick llm matmul 0.8 💾 内存带宽 71
需要 efficiency 校准 + concurrency 扫描 + TCO 估算 → 在计算器中评估 →

Operator support & optimization headroom

算子支持 & 优化空间 / Operator support & headroom

Per-operator support derived from software_support.engines + scale-up topology. Optimization headroom from measured efficiency factor.

Optimization headroom
+50 pp
moderate

No cases yet — using default 0.5 efficiency. Real headroom unknown until first measurement lands.

Communication (collective)
All-to-All 🟢 mature
all-to-all via ICI world_size=8960
AllReduce 🟢 mature
ICI ring all-reduce
Attention
Multi-Head Attention 🟢 mature
paged-attention via vLLM/SGLang/MindIE
FlashAttention-3 🔴 gap
No FA-3 path; falls back to FA-2 / vanilla SDPA
Matrix multiply (GEMM)
Matrix Multiplication 🟢 mature
GEMM supported on all inference engines
MoE routing
MoE Routing 🟢 mature
MoE gating supported via vLLM ≥0.4 / SGLang
Normalization
RMSNorm 🟢 mature
fused into engine kernels
Embedding
fused into engine kernels
Activation
SiLU / Swish 🟢 mature
fused into engine kernels
Softmax 🟢 mature
fused into engine kernels

Software-stack support

Engine Status BF16FP16FP4FP8 E4M3FP8 E5M2INT4 AWQ
HanGuangAI unconfirmed
LMDeploy unconfirmed
MindIE unconfirmed
MoRI unconfirmed
SGLang unconfirmed
TensorRT-LLM (Dynamo) unconfirmed
vLLM community

Existing deployment cases (0)

No measured cases yet for this card. Be the first contributor?

Citations

  1. [1] Google Cloud TPU v5p documentation — https://cloud.google.com/tpu/docs/v5p · accessed 2026-04-28 厂商声称
  2. [2] TPU v5p: 4 systolic-array TensorCores per chip + scalar/vector units, 4× HBM2e ⇒ 95 GB; 3D-torus ICI fabric (up to 8960 chips/pod); TSMC 5nm-class — https://cloud.google.com/tpu/docs/system-architecture-tpu-vm · accessed 2026-04-28 厂商声称
⚠ TPU v5p only available via Google Cloud.