Kavli Affiliate: Zheng Zhu
| First 5 Authors: Xianda Guo, Wenjie Yuan, Yunpeng Zhang, Tian Yang, Chenming Zhang
| Summary:
Depth estimation has been widely studied and serves as the fundamental step
of 3D perception for intelligent vehicles. Though significant progress has been
made in monocular depth estimation in the past decades, these attempts are
mainly conducted on the KITTI benchmark with only front-view cameras, which
ignores the correlations across surround-view cameras. In this paper, we
propose S3Depth, a Simple Baseline for Supervised Surround-view Depth
Estimation, to jointly predict the depth maps across multiple surrounding
cameras. Specifically, we employ a global-to-local feature extraction module
which combines CNN with transformer layers for enriched representations.
Further, the Adjacent-view Attention mechanism is proposed to enable the
intra-view and inter-view feature propagation. The former is achieved by the
self-attention module within each view, while the latter is realized by the
adjacent attention module, which computes the attention across multi-cameras to
exchange the multi-scale representations across surround-view feature maps.
Extensive experiments show that our method achieves superior performance over
existing state-of-the-art methods on both DDAD and nuScenes datasets.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3