Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Kavli Affiliate: Jing Wang

| First 5 Authors: Wentao Qu, Wentao Qu, , ,

| Summary:

Denoising Diffusion Probabilistic Models (DDPMs) have shown success in robust
3D object detection tasks. Existing methods often rely on the score matching
from 3D boxes or pre-trained diffusion priors. However, they typically require
multi-step iterations in inference, which limits efficiency. To address this,
we propose a textbfRobust single-stage fully textbfSparse 3D object
textbfDetection textbfNetwork with a Detachable Latent Framework (DLF) of
DDPMs, named RSDNet. Specifically, RSDNet learns the denoising process in
latent feature spaces through lightweight denoising networks like multi-level
denoising autoencoders (DAEs). This enables RSDNet to effectively understand
scene distributions under multi-level perturbations, achieving robust and
reliable detection. Meanwhile, we reformulate the noising and denoising
mechanisms of DDPMs, enabling DLF to construct multi-type and multi-level noise
samples and targets, enhancing RSDNet robustness to multiple perturbations.
Furthermore, a semantic-geometric conditional guidance is introduced to
perceive the object boundaries and shapes, alleviating the center feature
missing problem in sparse representations, enabling RSDNet to perform in a
fully sparse detection pipeline. Moreover, the detachable denoising network
design of DLF enables RSDNet to perform single-step detection in inference,
further enhancing detection efficiency. Extensive experiments on public
benchmarks show that RSDNet can outperform existing methods, achieving
state-of-the-art detection.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3