DreamGen: Unlocking Generalization in Robot Learning through Video World Models

Kavli Affiliate: Jing Wang

| First 5 Authors: Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck

| Summary:

We introduce DreamGen, a simple yet highly effective 4-stage pipeline for
training robot policies that generalize across behaviors and environments
through neural trajectories – synthetic robot data generated from video world
models. DreamGen leverages state-of-the-art image-to-video generative models,
adapting them to the target robot embodiment to produce photorealistic
synthetic videos of familiar or novel tasks in diverse environments. Since
these models generate only videos, we recover pseudo-action sequences using
either a latent action model or an inverse-dynamics model (IDM). Despite its
simplicity, DreamGen unlocks strong behavior and environment generalization: a
humanoid robot can perform 22 new behaviors in both seen and unseen
environments, while requiring teleoperation data from only a single
pick-and-place task in one environment. To evaluate the pipeline
systematically, we introduce DreamGen Bench, a video generation benchmark that
shows a strong correlation between benchmark performance and downstream policy
success. Our work establishes a promising new axis for scaling robot learning
well beyond manual data collection. Code available at
https://github.com/NVIDIA/GR00T-Dreams.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3