Kavli Affiliate: Jia Liu | First 5 Authors: Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian | Summary: In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two […]
Continue.. Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs