EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models

Kavli Affiliate: Jia Liu

| First 5 Authors: Wenhan Yao, Zedong XingXiarun Chen, Jia Liu, yongqiang He, Weiping Wen

| Summary:

Deep speech classification tasks, mainly including keyword spotting and
speaker verification, play a crucial role in speech-based human-computer
interaction. Recently, the security of these technologies has been demonstrated
to be vulnerable to backdoor attacks. Specifically speaking, speech samples are
attacked by noisy disruption and component modification in present triggers. We
suggest that speech backdoor attacks can strategically focus on emotion, a
higher-level subjective perceptual attribute inherent in speech. Furthermore,
we proposed that emotional voice conversion technology can serve as the speech
backdoor attack trigger, and the method is called EmoAttack. Based on this, we
conducted attack experiments on two speech classification tasks, showcasing
that EmoAttack method owns impactful trigger effectiveness and its remarkable
attack success rate and accuracy variance. Additionally, the ablation
experiments found that speech with intensive emotion is more suitable to be
targeted for attacks.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3

Read More