Kavli Affiliate: Zhuo Li | First 5 Authors: Yuting Tan, Yuting Tan, , , | Summary: Gradient-based adversarial prompting, such as the Greedy Coordinate Gradient (GCG) algorithm, has emerged as a powerful method for jailbreaking large language models (LLMs). In this paper, we present a systematic appraisal of GCG and its annealing-augmented variant, T-GCG, across […]
Continue.. The Resurgence of GCG Adversarial Attacks on Large Language Models