Kavli Affiliate: Zhuo Li
| First 5 Authors: Xuying Li, Zhuo Li, Yuji Kosuga, Victor Bian,
| Summary:
Large Language Models (LLMs) have demonstrated strong reasoning capabilities,
but their safety under adversarial conditions remains a challenge. This study
examines the impact of output length on the robustness of DeepSeek-R1,
particularly in Forced Thinking scenarios. We analyze responses across various
adversarial prompts and find that while longer outputs can improve safety
through self-correction, certain attack types exploit extended generations. Our
findings suggest that output length should be dynamically controlled to balance
reasoning effectiveness and security. We propose reinforcement learning-based
policy adjustments and adaptive token length regulation to enhance LLM safety.
| Search Query: ArXiv Query: search_query=au:”Zhuo Li”&id_list=&start=0&max_results=3