Kavli Affiliate: Jia Liu
| First 5 Authors: Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan
| Summary:
The goal of the acoustic scene classification (ASC) task is to classify
recordings into one of the predefined acoustic scene classes. However, in
real-world scenarios, ASC systems often encounter challenges such as recording
device mismatch, low-complexity constraints, and the limited availability of
labeled data. To alleviate these issues, in this paper, a data-efficient and
low-complexity ASC system is built with a new model architecture and better
training strategies. Specifically, we firstly design a new low-complexity
architecture named Rep-Mobile by integrating multi-convolution branches which
can be reparameterized at inference. Compared to other models, it achieves
better performance and less computational complexity. Then we apply the
knowledge distillation strategy and provide a comparison of the data efficiency
of the teacher model with different architectures. Finally, we propose a
progressive pruning strategy, which involves pruning the model multiple times
in small amounts, resulting in better performance compared to a single step
pruning. Experiments are conducted on the TAU dataset. With Rep-Mobile and
these training strategies, our proposed ASC system achieves the
state-of-the-art (SOTA) results so far, while also winning the first place with
a significant advantage over others in the DCASE2024 Challenge.
| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3