SC25 gLLM
- Paper Reading
- 3天前
- 15热度
- 0评论
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
几种并行方法
尝试消除泡泡:
目前在LLM推理里有两种不平衡:
stage 间不平衡
inter-stage dependency, where a stage cannot begin computation
until the preceding stage completes
比如两个GPU在算。例子1的GPU1算了1,然后传到GPU2,然后GPU2就闲置了,等GPU1算完2。例子2 的 GPU2算了1,但是GPU1想做3的话要等1传回来。
batch 间不平衡
where the number of concurrent micro-batches is limited by the pipeline depth.
PP 他们只要传结果数值