Kavli Affiliate: Zheng Zhu
| First 5 Authors: Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou
| Summary:
Talking head synthesis is an emerging technology with wide applications in
film dubbing, virtual avatars and online education. Recent NeRF-based methods
generate more natural talking videos, as they better capture the 3D structural
information of faces. However, a specific model needs to be trained for each
identity with a large dataset. In this paper, we propose Dynamic Facial
Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly
generalize to an unseen identity with few training data. Different from the
existing NeRF-based methods which directly encode the 3D geometry and
appearance of a specific person into the network, our DFRF conditions face
radiance field on 2D appearance images to learn the face prior. Thus the facial
radiance field can be flexibly adjusted to the new identity with few reference
images. Additionally, for better modeling of the facial deformations, we
propose a differentiable face warping module conditioned on audio signals to
deform all reference images to the query space. Extensive experiments show that
with only tens of seconds of training clip available, our proposed DFRF can
synthesize natural and high-quality audio-driven talking head videos for novel
identities with only 40k iterations. We highly recommend readers view our
supplementary video for intuitive comparisons. Code is available in
https://sstzal.github.io/DFRF/.
| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=10