CPCLDETECTOR: Knowledge Enhancement and Alignment Selection for Chinese Patronizing and Condescending Language Detection

Kavli Affiliate: Long Zhang

| First 5 Authors: Jiaxun Yang, Jiaxun Yang, , ,

| Summary:

Chinese Patronizing and Condescending Language (CPCL) is an implicitly
discriminatory toxic speech targeting vulnerable groups on Chinese video
platforms. The existing dataset lacks user comments, which are a direct
reflection of video content. This undermines the model’s understanding of video
content and results in the failure to detect some CPLC videos. To make up for
this loss, this research reconstructs a new dataset PCLMMPLUS that includes
103k comment entries and expands the dataset size. We also propose the
CPCLDetector model with alignment selection and knowledge-enhanced comment
content modules. Extensive experiments show the proposed CPCLDetector
outperforms the SOTA on PCLMM and achieves higher performance on PCLMMPLUS .
CPLC videos are detected more accurately, supporting content governance and
protecting vulnerable groups. Code and dataset are available at
https://github.com/jiaxunyang256/PCLD.

| Search Query: ArXiv Query: search_query=au:”Long Zhang”&id_list=&start=0&max_results=3