Kavli Affiliate: Jing Wang | First 5 Authors: Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng | Summary: Multimodal Large Language Models (MLLMs) have shown significant progress in offline video understanding. However, applying these models to real-world scenarios, such as autonomous driving and human-computer interaction, presents unique challenges due to the need for […]
Continue.. Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method