DeepSeek V4 Pro

deepseek MOE text deepseek-license 2026-04-24

架构

Total params
1600 B
Active params
49 B
Layers
64
Context
1024 k

详细规格

Hidden size
8192
FFN size
24576
Attention heads
64
KV heads
8
Head dim
128
Vocab size
132000
Attention type
csa+hca
MoE experts
256
MoE top-k
8
Expert hidden
2048

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 4.80e+9 1.80e+7
attention 1.20e+9 4.50e+6
moe-gate 1.00e+7 1.00e+6
rmsnorm 5.00e+6 1.00e+6

兼容硬件