Kavli Affiliate: Zheng Zhu
| First 5 Authors: Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu
| Summary:
Numerous studies have investigated the pivotal role of reliable 3D volume
representation in scene perception tasks, such as multi-view stereo (MVS) and
semantic scene completion (SSC). They typically construct 3D probability
volumes directly with geometric correspondence, attempting to fully address the
scene perception tasks in a single forward pass. However, such a single-step
solution makes it hard to learn accurate and convincing volumetric probability,
especially in challenging regions like unexpected occlusions and complicated
light reflections. Therefore, this paper proposes to decompose the complicated
3D volume representation learning into a sequence of generative steps to
facilitate fine and reliable scene perception. Considering the recent advances
achieved by strong generative diffusion models, we introduce a multi-step
learning framework, dubbed as VPD, dedicated to progressively refining the
Volumetric Probability in a Diffusion process. Extensive experiments are
conducted on scene perception tasks including multi-view stereo (MVS) and
semantic scene completion (SSC), to validate the efficacy of our method in
learning reliable volumetric representations. Notably, for the SSC task, our
work stands out as the first to surpass LiDAR-based methods on the
SemanticKITTI dataset.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3