AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models

Kavli Affiliate: Wei Gao

| First 5 Authors: Jihu Guo, Jihu Guo, , ,

| Summary:

Pipeline parallelism is widely used to train large language models (LLMs).
However, increasing heterogeneity in model architectures exacerbates pipeline
bubbles, thereby reducing training efficiency. Existing approaches overlook the
co-optimization of model partition, model placement, and workload scheduling,
resulting in limited efficiency improvement or even performance degradation. To
respond, we propose AdaPtis, an LLM training system that supports adaptive
pipeline parallelism. First, we develop a pipeline performance model to
accurately estimate training throughput. Second, AdaPtis jointly optimizes
model partition, model placement, and workload scheduling policies guided by
this performance model. Third, we design a unified pipeline executor that
efficiently supports the execution of diverse pipeline strategies. Extensive
experiments show that AdaPtis achieves an average speedup of 1.42x (up to
2.14x) over Megatron-LM I-1F1B across various LLM architectures and scales.

| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3

Read More