Kavli Affiliate: Cheng Peng
| First 5 Authors: Jingxing Li, Yongjae Lee, Abhay Kumar Yadav, Cheng Peng, Rama Chellappa
| Summary:
Image matching is a key component of modern 3D vision algorithms, essential
for accurate scene reconstruction and localization. MASt3R redefines image
matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal
matching scheme that accelerates matching by orders of magnitude while
preserving theoretical guarantees. This approach has gained strong traction,
with DUSt3R and MASt3R collectively cited over 250 times in a short span,
underscoring their impact. However, despite its accuracy, MASt3R’s inference
speed remains a bottleneck. On an A40 GPU, latency per image pair is 198.16 ms,
mainly due to computational overhead from the ViT encoder-decoder and Fast
Reciprocal Nearest Neighbor (FastNN) matching.
To address this, we introduce Speedy MASt3R, a post-training optimization
framework that enhances inference efficiency while maintaining accuracy. It
integrates multiple optimization techniques, including FlashMatch-an approach
leveraging FlashAttention v2 with tiling strategies for improved efficiency,
computation graph optimization via layer and tensor fusion having kernel
auto-tuning with TensorRT (GraphFusion), and a streamlined FastNN pipeline that
reduces memory access time from quadratic to linear while accelerating
block-wise correlation scoring through vectorized computation (FastNN-Lite).
Additionally, it employs mixed-precision inference with FP16/FP32 hybrid
computations (HybridCast), achieving speedup while preserving numerical
precision. Evaluated on Aachen Day-Night, InLoc, 7-Scenes, ScanNet1500, and
MegaDepth1500, Speedy MASt3R achieves a 54% reduction in inference time (198 ms
to 91 ms per image pair) without sacrificing accuracy. This advancement enables
real-time 3D understanding, benefiting applications like mixed reality
navigation and large-scale 3D scene reconstruction.
| Search Query: ArXiv Query: search_query=au:”Cheng Peng”&id_list=&start=0&max_results=3