OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

Kavli Affiliate: Zheng Zhu

| First 5 Authors: Zijian Zhou, Zheng Zhu, Holger Caesar, Miaojing Shi,

| Summary:

Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize
their relations, enabling the structured understanding of an image. Previous
methods focus on predicting predefined object and relation categories, hence
limiting their applications in the open world scenarios. With the rapid
development of large multimodal models (LMMs), significant progress has been
made in open-set object detection and segmentation, yet open-set relation
prediction in PSG remains unexplored. In this paper, we focus on the task of
open-set relation prediction integrated with a pretrained open-set panoptic
segmentation model to achieve true open-set panoptic scene graph generation
(OpenPSG). Our OpenPSG leverages LMMs to achieve open-set relation prediction
in an autoregressive manner. We introduce a relation query transformer to
efficiently extract visual features of object pairs and estimate the existence
of relations between them. The latter can enhance the prediction efficiency by
filtering irrelevant pairs. Finally, we design the generation and judgement
instructions to perform open-set relation prediction in PSG autoregressively.
To our knowledge, we are the first to propose the open-set PSG task. Extensive
experiments demonstrate that our method achieves state-of-the-art performance
in open-set relation prediction and panoptic scene graph generation. Code is
available at url{https://github.com/franciszzj/OpenPSG}.

| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3

Read More