Kavli Affiliate: Jing Wang | First 5 Authors: Minghui Li, Minghui Li, , , | Summary: Direct Prompt Injection (DPI) attacks pose a critical security threat to Large Language Models (LLMs) due to their low barrier of execution and high potential damage. To address the impracticality of existing white-box/gray-box methods and the poor transferability of […]
Continue.. Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling