DeepSeek R1

deepseek MOE text deepseek-license 2025-01-20

架构

Total params
671 B
Active params
37 B
Layers
61
Context
128 k

详细规格

Hidden size
7168
FFN size
18432
Attention heads
128
KV heads
128
Head dim
128
Vocab size
129280
Attention type
mla
MoE experts
256
MoE top-k
8
Expert hidden
2048

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 4.84e+10 4.84e+10
attention 1.61e+10 3.22e+10
moe-gate 1.12e+8 1.43e+10
rmsnorm 4.37e+6 1.75e+6

兼容硬件