EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling

Kavli Affiliate: Zheng Zhu

| First 5 Authors: Boyuan Wang, Boyuan Wang, , ,

| Summary:

The rapid advancement of Embodied AI has led to an increasing demand for
large-scale, high-quality real-world data. However, collecting such embodied
data remains costly and inefficient. As a result, simulation environments have
become a crucial surrogate for training robot policies. Yet, the significant
Real2Sim2Real gap remains a critical bottleneck, particularly in terms of
physical dynamics and visual appearance. To address this challenge, we propose
EmbodieDreamer, a novel framework that reduces the Real2Sim2Real gap from both
the physics and appearance perspectives. Specifically, we propose PhysAligner,
a differentiable physics module designed to reduce the Real2Sim physical gap.
It jointly optimizes robot-specific parameters such as control gains and
friction coefficients to better align simulated dynamics with real-world
observations. In addition, we introduce VisAligner, which incorporates a
conditional video diffusion model to bridge the Sim2Real appearance gap by
translating low-fidelity simulated renderings into photorealistic videos
conditioned on simulation states, enabling high-fidelity visual transfer.
Extensive experiments validate the effectiveness of EmbodieDreamer. The
proposed PhysAligner reduces physical parameter estimation error by 3.74%
compared to simulated annealing methods while improving optimization speed by
89.91%. Moreover, training robot policies in the generated photorealistic
environment leads to a 29.17% improvement in the average task success rate
across real-world tasks after reinforcement learning. Code, model and data will
be publicly available.

| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3