Kavli Affiliate: Zheng Zhu
| First 5 Authors: Yuan Xu, Yuan Xu, , ,
| Summary:
Imitation learning based policies perform well in robotic manipulation, but
they often degrade under *egocentric viewpoint shifts* when trained from a
single egocentric viewpoint. To address this issue, we present **EgoDemoGen**,
a framework that generates *paired* novel egocentric demonstrations by
retargeting actions in the novel egocentric frame and synthesizing the
corresponding egocentric observation videos with proposed generative video
repair model **EgoViewTransfer**, which is conditioned by a novel-viewpoint
reprojected scene video and a robot-only video rendered from the retargeted
joint actions. EgoViewTransfer is finetuned from a pretrained video generation
model using self-supervised double reprojection strategy. We evaluate
EgoDemoGen on both simulation (RoboTwin2.0) and real-world robot. After
training with a mixture of EgoDemoGen-generated novel egocentric demonstrations
and original standard egocentric demonstrations, policy success rate improves
**absolutely** by **+17.0%** for standard egocentric viewpoint and by
**+17.7%** for novel egocentric viewpoints in simulation. On real-world robot,
the **absolute** improvements are **+18.3%** and **+25.8%**. Moreover,
performance continues to improve as the proportion of EgoDemoGen-generated
demonstrations increases, with diminishing returns. These results demonstrate
that EgoDemoGen provides a practical route to egocentric viewpoint-robust
robotic manipulation.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3