Kavli Affiliate: Wei Gao
| First 5 Authors: Shangkun Sun, Xiaoyu Liang, Bowen Qu, Wei Gao,
| Summary:
The advent of next-generation video generation models like textit{Sora}
poses challenges for AI-generated content (AIGC) video quality assessment
(VQA). These models substantially mitigate flickering artifacts prevalent in
prior models, enable longer and complex text prompts and generate longer videos
with intricate, diverse motion patterns. Conventional VQA methods designed for
simple text and basic motion patterns struggle to evaluate these content-rich
videos. To this end, we propose textbf{CRAVE}
(underline{C}ontent-underline{R}ich underline{A}IGC underline{V}ideo
underline{E}valuator), specifically for the evaluation of Sora-era AIGC
videos. CRAVE proposes the multi-granularity text-temporal fusion that aligns
long-form complex textual semantics with video dynamics. Additionally, CRAVE
leverages the hybrid motion-fidelity modeling to assess temporal artifacts.
Furthermore, given the straightforward prompts and content in current AIGC VQA
datasets, we introduce textbf{CRAVE-DB}, a benchmark featuring content-rich
videos from next-generation models paired with elaborate prompts. Extensive
experiments have shown that the proposed CRAVE achieves excellent results on
multiple AIGC VQA benchmarks, demonstrating a high degree of alignment with
human perception. All data and code will be publicly available at
https://github.com/littlespray/CRAVE.
| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3