DeepSeek R1

deepseek MOE text deepseek-license 2025-01-20

Architecture

Total params

671 B

Active params

37 B

Layers

61

Context

128 k

Detailed specs

Hidden size

7168

FFN size

18432

Attention heads

128

KV heads

128

Head dim

128

Vocab size

129280

Attention type

mla

MoE experts

256

MoE top-k

8

Expert hidden

2048

Operator decomposition (per token)

Operator	FLOPs / token	Bytes / token
matmul	4.84e+10	4.84e+10
attention	1.61e+10	3.22e+10
moe-gate	1.12e+8	1.43e+10
rmsnorm	4.37e+6	1.75e+6

Compatible hardware

— unknown AMD Instinct MI300A — unknown AMD Instinct MI300X — unknown AMD Instinct MI325X — unknown AMD Instinct MI355X — unknown Apple M4 Max Neural Engine — unknown AWS Inferentia 2 🟢 measured AWS Trainium 2 — unknown 壁仞 BR100 — unknown 壁仞 BR104 — unknown 寒武纪 MLU370-X8 — unknown 寒武纪思元 590 — unknown Cerebras WSE-3 — unknown 燧原云燧 T21 — unknown Etched Sohu — unknown Google TPU v5p — unknown Google TPU Trillium (v6e) — unknown Groq LPU (TSP v1) 🟢 measured 昇腾 910B — unknown 昇腾 910C — unknown 昇腾 950 — unknown 海光 DCU K100 — unknown 海光 DCU Z100 🟢 measured 天数智芯天垓 100 — unknown Intel Gaudi 2 — unknown Intel Gaudi 3 — unknown 沐曦曦云 C500 — unknown 摩尔线程 MTT S4000 — unknown NVIDIA A100 SXM4 80GB — unknown NVIDIA B200 SXM 180GB — unknown NVIDIA B300 SXM 288GB — unknown NVIDIA GB200 NVL72 — unknown NVIDIA GB300 NVL72 — unknown NVIDIA H100 SXM5 80GB — unknown NVIDIA H200 SXM 141GB — unknown NVIDIA L40S — unknown NVIDIA R200 SXM (Vera Rubin) — unknown 平头哥含光 800 — unknown SambaNova SN40L — unknown Tenstorrent Wormhole n300

Open in calculator → Preset 8× H100 Preset 8× Ascend 910C Preset 8× MI355X (FP4)