MiniMax M2.7

minimax HYBRID text MIT 2026-04-10

Architecture

Total params
456 B
Active params
46 B
Layers
80
Context
4096 k

Detailed specs

Hidden size
6144
FFN size
16384
Attention heads
64
KV heads
8
Head dim
128
Vocab size
200000
Attention type
lightning
MoE experts
32
MoE top-k
2
Expert hidden
16384

Operator decomposition (per token)

Operator FLOPs / token Bytes / token
matmul 4.83e+10 4.83e+10
attention 1.61e+10 2.52e+10
moe-gate 1.57e+7 3.22e+10
rmsnorm 4.92e+6 1.97e+6

Compatible hardware