Kavli Affiliate: Ke Wang
| First 5 Authors: Jiabao Wang, Zhaojiang Liu, Qiang Meng, Liujiang Yan, Ke Wang
| Summary:
Occupancy prediction, aiming at predicting the occupancy status within
voxelized 3D environment, is quickly gaining momentum within the autonomous
driving community. Mainstream occupancy prediction works first discretize the
3D environment into voxels, then perform classification on such dense grids.
However, inspection on sample data reveals that the vast majority of voxels is
unoccupied. Performing classification on these empty voxels demands suboptimal
computation resource allocation, and reducing such empty voxels necessitates
complex algorithm designs. To this end, we present a novel perspective on the
occupancy prediction task: formulating it as a streamlined set prediction
paradigm without the need for explicit space modeling or complex sparsification
procedures. Our proposed framework, called OPUS, utilizes a transformer
encoder-decoder architecture to simultaneously predict occupied locations and
classes using a set of learnable queries. Firstly, we employ the Chamfer
distance loss to scale the set-to-set comparison problem to unprecedented
magnitudes, making training such model end-to-end a reality. Subsequently,
semantic classes are adaptively assigned using nearest neighbor search based on
the learned locations. In addition, OPUS incorporates a suite of non-trivial
strategies to enhance model performance, including coarse-to-fine learning,
consistent point sampling, and adaptive re-weighting, etc. Finally, compared
with current state-of-the-art methods, our lightest model achieves superior
RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model
surpasses previous best results by 6.1 RayIoU.
| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3