Kavli Affiliate: Zheng Zhu
| First 5 Authors: Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Guan Huang
| Summary:
Closed-loop simulation is essential for advancing end-to-end autonomous
driving systems. Contemporary sensor simulation methods, such as NeRF and 3DGS,
rely predominantly on conditions closely aligned with training data
distributions, which are largely confined to forward-driving scenarios.
Consequently, these methods face limitations when rendering complex maneuvers
(e.g., lane change, acceleration, deceleration). Recent advancements in
autonomous-driving world models have demonstrated the potential to generate
diverse driving videos. However, these approaches remain constrained to 2D
video generation, inherently lacking the spatiotemporal coherence required to
capture intricacies of dynamic driving environments. In this paper, we
introduce textit{DriveDreamer4D}, which enhances 4D driving scene
representation leveraging world model priors. Specifically, we utilize the
world model as a data machine to synthesize novel trajectory videos based on
real-world driving data. Notably, we explicitly leverage structured conditions
to control the spatial-temporal consistency of foreground and background
elements, thus the generated data adheres closely to traffic constraints. To
our knowledge, textit{DriveDreamer4D} is the first to utilize video generation
models for improving 4D reconstruction in driving scenarios. Experimental
results reveal that textit{DriveDreamer4D} significantly enhances generation
quality under novel trajectory views, achieving a relative improvement in FID
by 24.5%, 39.0%, and 10.5% compared to PVG, $text{S}^3$Gaussian, and
Deformable-GS. Moreover, textit{DriveDreamer4D} markedly enhances the
spatiotemporal coherence of driving agents, which is verified by a
comprehensive user study and the relative increases of 20.3%, 42.0%, and
13.7% in the NTA-IoU metric.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3