Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

Kavli Affiliate: Feng Wang

| First 5 Authors: Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan

| Summary:

Speaker recognition systems (SRSs) have recently been shown to be vulnerable
to adversarial attacks, raising significant security concerns. In this work, we
systematically investigate transformation and adversarial training based
defenses for securing SRSs. According to the characteristic of SRSs, we present
22 diverse transformations and thoroughly evaluate them using 7 recent
promising adversarial attacks (4 white-box and 3 black-box) on speaker
recognition. With careful regard for best practices in defense evaluations, we
analyze the strength of transformations to withstand adaptive attacks. We also
evaluate and understand their effectiveness against adaptive attacks when
combined with adversarial training. Our study provides lots of useful insights
and findings, many of them are new or inconsistent with the conclusions in the
image and speech recognition domains, e.g., variable and constant bit rate
speech compressions have different performance, and some non-differentiable
transformations remain effective against current promising evasion techniques
which often work well in the image domain. We demonstrate that the proposed
novel feature-level transformation combined with adversarial training is rather
effective compared to the sole adversarial training in a complete white-box
setting, e.g., increasing the accuracy by 13.62% and attack cost by two orders
of magnitude, while other transformations do not necessarily improve the
overall defense capability. This work sheds further light on the research
directions in this field. We also release our evaluation platform SPEAKERGUARD
to foster further research.

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=10