Learning Robust 3D Representation from CLIP via Dual Denoising

Kavli Affiliate: Wei Gao | First 5 Authors: Shuqing Luo, Bowen Qu, Wei Gao, , | Summary: In this paper, we explore a critical yet under-investigated issue: how to learn robust and well-generalized 3D representation from pre-trained vision language models such as CLIP. Previous works have demonstrated that cross-modal distillation can provide rich and useful […]

Continue.. Learning Robust 3D Representation from CLIP via Dual Denoising