FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

Kavli Affiliate: Ke Wang

| First 5 Authors: Chuhao Liu, Ke Wang, Jieqi Shi, Zhijian Qiao, Shaojie Shen

| Summary:

Semantic mapping based on the supervised object detectors is sensitive to
image distribution. In real-world environments, the object detection and
segmentation performance can lead to a major drop, preventing the use of
semantic mapping in a wider domain. On the other hand, the development of
vision-language foundation models demonstrates a strong zero-shot
transferability across data distribution. It provides an opportunity to
construct generalizable instance-aware semantic maps. Hence, this work explores
how to boost instance-aware semantic mapping from object detection generated
from foundation models. We propose a probabilistic label fusion method to
predict close-set semantic classes from open-set label measurements. An
instance refinement module merges the over-segmented instances caused by
inconsistent segmentation. We integrate all the modules into a unified semantic
mapping system. Reading a sequence of RGB-D input, our work incrementally
reconstructs an instance-aware semantic map. We evaluate the zero-shot
performance of our method in ScanNet and SceneNN datasets. Our method achieves
40.3 mean average precision (mAP) on the ScanNet semantic instance segmentation
task. It outperforms the traditional semantic mapping method significantly.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3

Read More