Qwen2.5-Coder 32B Instruct

alibaba DENSE text Apache-2.0 2024-11-12

Architecture

Total params
32 B
Active params
32 B
Layers
64
Context
128 k

Detailed specs

Hidden size
5120
FFN size
27648
Attention heads
40
KV heads
8
Head dim
128
Vocab size
152064
Attention type
gqa

Operator decomposition (per token)

Operator FLOPs / token Bytes / token
matmul 6.05e+10 6.05e+10
attention 3.50e+9 3.50e+9
rmsnorm 6.55e+6 8.19e+5
rope 3.28e+5 4.10e+4
silu 1.22e+8 6.08e+7

Compatible hardware