Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

Kavli Affiliate: Jia Liu

| First 5 Authors: Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Weiping Wen

| Summary:

Speech recognition is an essential start ring of human-computer interaction,
and recently, deep learning models have achieved excellent success in this
task. However, when the model training and private data provider are always
separated, some security threats that make deep neural networks (DNNs) abnormal
deserve to be researched. In recent years, the typical backdoor attacks have
been researched in speech recognition systems. The existing backdoor methods
are based on data poisoning. The attacker adds some incorporated changes to
benign speech spectrograms or changes the speech components, such as pitch and
timbre. As a result, the poisoned data can be detected by human hearing or
automatic deep algorithms. To improve the stealthiness of data poisoning, we
propose a non-neural and fast algorithm called Random Spectrogram Rhythm
Transformation (RSRT) in this paper. The algorithm combines four steps to
generate stealthy poisoned utterances. From the perspective of rhythm component
transformation, our proposed trigger stretches or squeezes the mel spectrograms
and recovers them back to signals. The operation keeps timbre and content
unchanged for good stealthiness. Our experiments are conducted on two kinds of
speech recognition tasks, including testing the stealthiness of poisoned
samples by speaker verification and automatic speech recognition. The results
show that our method has excellent effectiveness and stealthiness. The rhythm
trigger needs a low poisoning rate and gets a very high attack success rate.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3