Kavli Affiliate: Long Zhang | First 5 Authors: Yun Wang, Yun Wang, , , | Summary: Video Large Language Models (Video-LLMs) excel at general video understanding but struggle with long-form videos due to context window limits. Consequently, recent approaches focus on keyframe retrieval, condensing lengthy videos into a small set of informative frames. Despite their […]
Continue.. Episodic Memory Representation for Long-form Video Understanding