Mistral Large 3

mistral DENSE text MRL-2 2025-08-14

架构

Total params
123 B
Active params
123 B
Layers
88
Context
125 k

详细规格

Hidden size
12288
FFN size
28672
Attention heads
96
KV heads
8
Head dim
128
Vocab size
32768
Attention type
gqa

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 2.32e+11 2.32e+11
attention 1.25e+10 1.25e+10
rmsnorm 1.47e+7 1.97e+6
rope 7.86e+5 9.83e+4
silu 2.53e+8 1.26e+8

兼容硬件