UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

Kavli Affiliate: Jing Wang

| First 5 Authors: Jian Wang, Jing Wang, Shenghui Rong, Bo He,

| Summary:

Underwater monocular depth estimation serves as the foundation for tasks such
as 3D reconstruction of underwater scenes. However, due to the influence of
light and medium, the underwater environment undergoes a distinctive imaging
process, which presents challenges in accurately estimating depth from a single
image. The existing methods fail to consider the unique characteristics of
underwater environments, leading to inadequate estimation results and limited
generalization performance. Furthermore, underwater depth estimation requires
extracting and fusing both local and global features, which is not fully
explored in existing methods. In this paper, an end-to-end learning framework
for underwater monocular depth estimation called UMono is presented, which
incorporates underwater image formation model characteristics into network
architecture, and effectively utilize both local and global features of
underwater image. Experimental results demonstrate that the proposed method is
effective for underwater monocular depth estimation and outperforms the
existing methods in both quantitative and qualitative analyses.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3