Qwen3.5 397B Reasoning

alibaba MOE text Apache-2.0 2026-03-05

架构

Total params
397 B
Active params
22 B
Layers
64
Context
128 k

详细规格

Hidden size
5120
FFN size
14336
Attention heads
40
KV heads
8
Head dim
128
Vocab size
152064
Attention type
gqa
MoE experts
128
MoE top-k
8
Expert hidden
1536

算子拆解 (per token)

算子 FLOPs / token Bytes / token
matmul 2.82e+10 2.82e+10
attention 9.40e+9 1.45e+10
moe-gate 4.19e+7 8.05e+9
rmsnorm 3.28e+6 1.31e+6

兼容硬件