Kavli Affiliate: Wei Gao | First 5 Authors: Wei Gao, Wei Gao, , , | Summary: Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from significant GPU underutilization, referred to as bubbles, caused by imbalanced response lengths within rollout […]
Continue.. RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training