gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling 几种并行方法 尝试消除泡泡: 目前在LLM推理里有两种不平衡: stage 间不平衡 inter-stage dependency, where a stage cannot begin comput
MoE Survey withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs: The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models". 一文弄懂Mixture of Experts