Kavli Affiliate: Zheng Zhu
| First 5 Authors: Bohan Li, Jingxin Dong, Yunnan Wang, Jinming Liu, Lianying Yin
| Summary:
Recent works have explored the fundamental role of depth estimation in
multi-view stereo (MVS) and semantic scene completion (SSC). They generally
construct 3D cost volumes to explore geometric correspondence in depth, and
estimate such volumes in a single step relying directly on the ground truth
approximation. However, such problem cannot be thoroughly handled in one step
due to complex empirical distributions, especially in challenging regions like
occlusions, reflections, etc. In this paper, we formulate the depth estimation
task as a multi-step distribution approximation process, and introduce a new
paradigm of modeling the Volumetric Probability Distribution progressively
(step-by-step) following a Markov chain with Diffusion models (VPDD).
Specifically, to constrain the multi-step generation of volume in VPDD, we
construct a meta volume guidance and a confidence-aware contextual guidance as
conditional geometry priors to facilitate the distribution approximation. For
the sampling process, we further investigate an online filtering strategy to
maintain consistency in volume representations for stable training. Experiments
demonstrate that our plug-and-play VPDD outperforms the state-of-the-arts for
tasks of MVS and SSC, and can also be easily extended to different baselines to
get improvement. It is worth mentioning that we are the first camera-based work
that surpasses LiDAR-based methods on the SemanticKITTI dataset.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3