MiniMax M2.7

minimax HYBRID text MIT 2026-04-10

架构

Total params
456 B
Active params
46 B
Layers
80
Context
4096 k

详细规格

Hidden size
6144
FFN size
16384
Attention heads
64
KV heads
8
Head dim
128
Vocab size
200000
Attention type
lightning
MoE experts
32
MoE top-k
2
Expert hidden
16384

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 4.83e+10 4.83e+10
attention 1.61e+10 2.52e+10
moe-gate 1.57e+7 3.22e+10
rmsnorm 4.92e+6 1.97e+6

兼容硬件