Kavli Affiliate: Jing Wang | First 5 Authors: Jiasong Feng, Ao Ma, Jing Wang, Bo Cheng, Xiaodan Liang | Summary: Synthesizing motion-rich and temporally consistent videos remains a challenge in artificial intelligence, especially when dealing with extended durations. Existing text-to-video (T2V) models commonly employ spatial cross-attention for text control, equivalently guiding different frame generations without […]
Continue.. FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance