DeepSeek V4 Flash

deepseek MOE text deepseek-license 2026-04-24

Architecture

Total params
284 B
Active params
13 B
Layers
32
Context
1024 k

Detailed specs

Hidden size
4096
FFN size
14336
Attention heads
32
KV heads
8
Head dim
128
Vocab size
132000
Attention type
csa+hca
MoE experts
64
MoE top-k
4
Expert hidden
1408

Operator decomposition (per token)

Operator FLOPs / token Bytes / token
matmul 1.13e+10 1.13e+10
attention 3.22e+9 4.83e+9
moe-gate 8.39e+6 1.48e+9
rmsnorm 1.31e+6 5.24e+5

Compatible hardware