Kavli Affiliate: Jing Wang | First 5 Authors: Yuqi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao | Summary: Multi-modal large language models (MLLMs) can understand image-language prompts and demonstrate impressive reasoning ability. In this paper, we extend MLLMs’ output by empowering MLLMs with the segmentation ability. The extended MLLMs can both output language […]
Continue.. Empowering Segmentation Ability to Multi-modal Large Language Models