FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels

Kavli Affiliate: Feng Wang

| First 5 Authors: Lue Fan, Feng Wang, Naiyan Wang, Zhaoxiang Zhang,

| Summary:

LiDAR-based fully sparse architecture has garnered increasing attention.
FSDv1 stands out as a representative work, achieving impressive efficacy and
efficiency, albeit with intricate structures and handcrafted designs. In this
paper, we present FSDv2, an evolution that aims to simplify the previous FSDv1
while eliminating the inductive bias introduced by its handcrafted
instance-level representation, thus promoting better general applicability. To
this end, we introduce the concept of textbf{virtual voxels}, which takes over
the clustering-based instance segmentation in FSDv1. Virtual voxels not only
address the notorious issue of the Center Feature Missing problem in fully
sparse detectors but also endow the framework with a more elegant and
streamlined approach. Consequently, we develop a suite of components to
complement the virtual voxel concept, including a virtual voxel encoder, a
virtual voxel mixer, and a virtual voxel assignment strategy. Through empirical
validation, we demonstrate that the virtual voxel mechanism is functionally
similar to the handcrafted clustering in FSDv1 while being more general. We
conduct experiments on three large-scale datasets: Waymo Open Dataset,
Argoverse 2 dataset, and nuScenes dataset. Our results showcase
state-of-the-art performance on all three datasets, highlighting the
superiority of FSDv2 in long-range scenarios and its general applicability to
achieve competitive performance across diverse scenarios. Moreover, we provide
comprehensive experimental analysis to elucidate the workings of FSDv2. To
foster reproducibility and further research, we have open-sourced FSDv2 at
https://github.com/tusen-ai/SST.

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=3

Read More