算子目录

OPERATORS

10 个算子 · FLOPs / bytes 公式 · 跨硬件实现差异

Sigmoid-weighted Linear Unit; default activation in modern LLMs (Llama, Qwen)

Softmax over attention scores; numerically stable form

Standard multi-head attention; FLOPs and bytes are per layer per token

Hopper-optimized FlashAttention v3 using TMA and FP8 paths

Used in expert parallelism for token dispatch and combine

Ring or tree all-reduce across TP group; bandwidth-dominated

Rotary positional embedding applied to Q and K projections

General matrix multiply (GEMM); base operator for FFN and attention projections

Top-k expert selection via softmax over router logits

Root mean square normalization; cheaper than LayerNorm, common in modern LLMs