SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation

Kavli Affiliate: Ke Wang

| First 5 Authors: Lixiong Qin, Mei Wang, Chao Deng, Ke Wang, Xi Chen

| Summary:

In recent years, vision transformers have been introduced into face
recognition and analysis and have achieved performance breakthroughs. However,
most previous methods generally train a single model or an ensemble of models
to perform the desired task, which ignores the synergy among different tasks
and fails to achieve improved prediction accuracy, increased data efficiency,
and reduced training time. This paper presents a multi-purpose algorithm for
simultaneous face recognition, facial expression recognition, age estimation,
and face attribute estimation (40 attributes including gender) based on a
single Swin Transformer. Our design, the SwinFace, consists of a single shared
backbone together with a subnet for each set of related tasks. To address the
conflicts among multiple tasks and meet the different demands of tasks, a
Multi-Level Channel Attention (MLCA) module is integrated into each
task-specific analysis subnet, which can adaptively select the features from
optimal levels and channels to perform the desired tasks. Extensive experiments
show that the proposed model has a better understanding of the face and
achieves excellent performance for all tasks. Especially, it achieves 90.97%
accuracy on RAF-DB and 0.22 $epsilon$-error on CLAP2015, which are
state-of-the-art results on facial expression recognition and age estimation
respectively. The code and models will be made publicly available at
https://github.com/lxq1000/SwinFace.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3