Kavli Affiliate: Wei Gao
| First 5 Authors: Han Lu, Han Lu, , ,
| Summary:
Synchronous Reinforcement Learning (RL) post-training has emerged as a
crucial step for enhancing Large Language Models (LLMs) with diverse
capabilities. However, many systems designed to accelerate RL post-training
still suffer from low resource utilization and limited scalability. We present
ROLL Flash, a system that extends ROLL with native support for asynchronous RL
post-training. ROLL Flash is built upon two core design principles:
fine-grained parallelism and rollout-train decoupling. Guided by these
principles, ROLL Flash provides flexible programming interfaces that enable a
fully asynchronous training architecture and support efficient rollout
mechanisms, including queue scheduling and environment-level asynchronous
execution. Through comprehensive theoretical analysis and extensive
experiments, we demonstrate that ROLL Flash significantly improves resource
utilization and scalability over synchronous RL post-training. ROLL Flash
achieves up to 2.24x speedup on RLVR tasks and 2.72x on agentic tasks, using
the same GPU budget as synchronous baselines. Furthermore, we implement several
popular off-policy algorithms and verify that asynchronous training can achieve
performance on par with synchronous training.
| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3