Kavli Affiliate: Zhuo Li | First 5 Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao | Summary: Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which […]
Continue.. Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models