YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework

Kavli Affiliate: Ting Xu

| First 5 Authors: Dahang Wan, Rongsheng Lu, Yang Fang, Xianli Lang, Shuangbao Shu

| Summary:

Multispectral object detection, which integrates information from multiple
bands, can enhance detection accuracy and environmental adaptability, holding
great application potential across various fields. Although existing methods
have made progress in cross-modal interaction, low-light conditions, and model
lightweight, there are still challenges like the lack of a unified single-stage
framework, difficulty in balancing performance and fusion strategy, and
unreasonable modality weight allocation. To address these, based on the YOLOv11
framework, we present YOLOv11-RGBT, a new comprehensive multimodal object
detection framework. We designed six multispectral fusion modes and
successfully applied them to models from YOLOv3 to YOLOv12 and RT-DETR. After
reevaluating the importance of the two modalities, we proposed a P3 mid-fusion
strategy and multispectral controllable fine-tuning (MCF) strategy for
multispectral models. These improvements optimize feature fusion, reduce
redundancy and mismatches, and boost overall model performance. Experiments
show our framework excels on three major open-source multispectral object
detection datasets, like LLVIP and FLIR. Particularly, the multispectral
controllable fine-tuning strategy significantly enhanced model adaptability and
robustness. On the FLIR dataset, it consistently improved YOLOv11 models’ mAP
by 3.41%-5.65%, reaching a maximum of 47.61%, verifying the framework and
strategies’ effectiveness. The code is available at:
https://github.com/wandahangFY/YOLOv11-RGBT.

| Search Query: ArXiv Query: search_query=au:”Ting Xu”&id_list=&start=0&max_results=3