RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

Kavli Affiliate: Zheng Zhu

| First 5 Authors: Liu Liu, Xiaofeng Wang, Guosheng Zhao, Keyu Li, Wenkang Qin

| Summary:

Imitation Learning has become a fundamental approach in robotic manipulation.
However, collecting large-scale real-world robot demonstrations is
prohibitively expensive. Simulators offer a cost-effective alternative, but the
sim-to-real gap make it extremely challenging to scale. Therefore, we introduce
RoboTransfer, a diffusion-based video generation framework for robotic data
synthesis. Unlike previous methods, RoboTransfer integrates multi-view geometry
with explicit control over scene components, such as background and object
attributes. By incorporating cross-view feature interactions and global
depth/normal conditions, RoboTransfer ensures geometry consistency across
views. This framework allows fine-grained control, including background edits
and object swaps. Experiments demonstrate that RoboTransfer is capable of
generating multi-view videos with enhanced geometric consistency and visual
fidelity. In addition, policies trained on the data generated by RoboTransfer
achieve a 33.3% relative improvement in the success rate in the DIFF-OBJ
setting and a substantial 251% relative improvement in the more challenging
DIFF-ALL scenario. Explore more demos on our project page:
https://horizonrobotics.github.io/robot_lab/robotransfer

| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3