SC25 gLLM

Haibin
Paper Reading
2025-07-23
236 Views
0 Comments

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

几种并行方法

尝试消除泡泡：

目前在LLM推理里有两种不平衡：
stage 间不平衡
inter-stage dependency, where a stage cannot begin computation
until the preceding stage completes

比如两个GPU在算。例子1的GPU1算了1，然后传到GPU2，然后GPU2就闲置了，等GPU1算完2。例子2 的 GPU2算了1，但是GPU1想做3的话要等1传回来。

batch 间不平衡
where the number of concurrent micro-batches is limited by the pipeline depth.

PP 他们只要传结果数值

恨相知晚

Join the discussion!

Theme By Document. ICP备案号粤ICP备2024294695号