Kavli Affiliate: Ke Wang
| First 5 Authors: Houxing Ren, Houxing Ren, , ,
| Summary:
The code generation capabilities of Large Language Models (LLMs) have
advanced applications like tool invocation and problem-solving. However,
improving performance in code-related tasks remains challenging due to limited
training data that is verifiable with accurate test cases. While Direct
Preference Optimization (DPO) has shown promise, existing methods for
generating test cases still face limitations. In this paper, we propose a novel
approach that splits code snippets into smaller, granular blocks, creating more
diverse DPO pairs from the same test cases. Additionally, we introduce the
Abstract Syntax Tree (AST) splitting and curriculum training method to enhance
the DPO training. Our approach demonstrates significant improvements in code
generation tasks, as validated by experiments on benchmark datasets such as
HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench. Code and data
are available at https://github.com/SenseLLM/StructureCoder.
| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3