FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis

Kavli Affiliate: Yi Zhou

| First 5 Authors: Ziqi Ni, Ao Fu, Yi Zhou, ,

| Summary:

Achieving high-fidelity lip-speech synchronization in audio-driven talking
portrait synthesis remains challenging. While multi-stage pipelines or
diffusion models yield high-quality results, they suffer from high
computational costs. Some approaches perform well on specific individuals with
low resources, yet still exhibit mismatched lip movements. The aforementioned
methods are modeled in the pixel domain. We observed that there are noticeable
discrepancies in the frequency domain between the synthesized talking videos
and natural videos. Currently, no research on talking portrait synthesis has
considered this aspect. To address this, we propose a FREquency-modulated,
high-fidelity, and real-time Audio-driven talKing portrait synthesis framework,
named FREAK, which models talking portraits from the frequency domain
perspective, enhancing the fidelity and naturalness of the synthesized
portraits. FREAK introduces two novel frequency-based modules: 1) the Visual
Encoding Frequency Modulator (VEFM) to couple multi-scale visual features in
the frequency domain, better preserving visual frequency information and
reducing the gap in the frequency spectrum between synthesized and natural
frames. and 2) the Audio Visual Frequency Modulator (AVFM) to help the model
learn the talking pattern in the frequency domain and improve audio-visual
synchronization. Additionally, we optimize the model in both pixel domain and
frequency domain jointly. Furthermore, FREAK supports seamless switching
between one-shot and video dubbing settings, offering enhanced flexibility. Due
to its superior performance, it can simultaneously support high-resolution
video results and real-time inference. Extensive experiments demonstrate that
our method synthesizes high-fidelity talking portraits with detailed facial
textures and precise lip synchronization in real-time, outperforming
state-of-the-art methods.

| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=3

Read More