DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Kavli Affiliate: Zheng Zhu

| First 5 Authors: [#item_custom_name[1]], [#item_custom_name[2]], [#item_custom_name[3]], [#item_custom_name[4]], [#item_custom_name[5]]

| Summary:

Monocular depth estimation is a challenging task that predicts the pixel-wise
depth from a single 2D image. Current methods typically model this problem as a
regression or classification task. We propose DiffusionDepth, a new approach
that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to `denoise’ random depth distribution
into a depth map with the guidance of monocular visual conditions. The process
is performed in the latent space encoded by a dedicated depth encoder and
decoder. Instead of diffusing ground truth (GT) depth, the model learns to
reverse the process of diffusing the refined depth of itself into random depth
distribution. This self-diffusion formulation overcomes the difficulty of
applying generative models to sparse GT depth scenarios. The proposed approach
benefits this task by refining depth estimation step by step, which is superior
for generating accurate and highly detailed depth maps. Experimental results on
KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion
approach could reach state-of-the-art performance in both indoor and outdoor
scenarios with acceptable inference time.

| Search Query: [#feed_custom_title]

Read More