RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

Kavli Affiliate: Wei Gao | First 5 Authors: Wei Gao, Wei Gao, , , | Summary: Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from significant GPU underutilization, referred to as bubbles, caused by imbalanced response lengths within rollout […]


Continue.. RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation

Kavli Affiliate: Wei Gao | First 5 Authors: Dekun Lu, Dekun Lu, , , | Summary: End-to-end robot manipulation policies offer significant potential for enabling embodied agents to understand and interact with the world. Unlike traditional modular pipelines, end-to-end learning mitigates key limitations such as information loss between modules and feature misalignment caused by isolated […]


Continue.. ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation