Mistral Small 4

mistral MOE text Apache-2.0 2026-03-16

架构

Total params
119 B
Active params
22 B
Layers
40
Context
128 k

详细规格

Hidden size
5120
FFN size
14336
Attention heads
32
KV heads
8
Head dim
128
Vocab size
131072
Attention type
gqa
MoE experts
8
MoE top-k
2
Expert hidden
14336

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 1.76e+10 1.76e+10
attention 5.87e+9 9.23e+9
moe-gate 1.64e+6 1.17e+10
rmsnorm 2.05e+6 8.19e+5

兼容硬件