Kavli Affiliate: Jing Wang
| First 5 Authors: Qiang Zhou, Chaohui Yu, Jingliang Li, Yuang Liu, Jing Wang
| Summary:
Compared to the multi-stage self-supervised multi-view stereo (MVS) method,
the end-to-end (E2E) approach has received more attention due to its concise
and efficient training pipeline. Recent E2E self-supervised MVS approaches have
integrated third-party models (such as optical flow models, semantic
segmentation models, NeRF models, etc.) to provide additional consistency
constraints, which grows GPU memory consumption and complicates the model’s
structure and training pipeline. In this work, we propose an efficient
framework for end-to-end self-supervised MVS, dubbed ES-MVSNet. To alleviate
the high memory consumption of current E2E self-supervised MVS frameworks, we
present a memory-efficient architecture that reduces memory usage by 43%
without compromising model performance. Furthermore, with the novel design of
asymmetric view selection policy and region-aware depth consistency, we achieve
state-of-the-art performance among E2E self-supervised MVS methods, without
relying on third-party models for additional consistency signals. Extensive
experiments on DTU and Tanks&Temples benchmarks demonstrate that the proposed
ES-MVSNet approach achieves state-of-the-art performance among E2E
self-supervised MVS methods and competitive performance to many supervised and
multi-stage self-supervised methods.
| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3