Kavli Affiliate: Zhuo Li | First 5 Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao | Summary: Despite the substantial advancements in artificial intelligence, large language models (LLMs) remain being challenged by generation safety. With adversarial jailbreaking prompts, one can effortlessly induce LLMs to output harmful content, causing unexpected negative social impacts. […]
Continue.. Atoxia: Red-teaming Large Language Models with Target Toxic Answers