ToolCoder: Teach Code Generation Models to use API search tools

Kavli Affiliate: Zhuo Li

| First 5 Authors: Kechi Zhang, Huangzhao Zhang, Ge Li, Jia Li, Zhuo Li

| Summary:

Automatically generating source code from natural language descriptions has
been a growing field of research in recent years. However, current large-scale
code generation models often encounter difficulties when selecting appropriate
APIs for specific contexts. These models may generate APIs that do not meet
requirements or refer to non-existent APIs in third-party libraries, especially
for lesser-known or private libraries. Inspired by the process of human
developers using tools to search APIs, we propose ToolCoder, a novel approach
that integrates API search tools with existing models to assist in code
generation and API selection. To teach our model to use tools, we introduce an
automated data annotation method using ChatGPT to add tool usage information
into the source code data and fine-tune code generation models. During
inference, we integrate API search tools into the generation process so that
our model can automatically use the search tool to get suggestions when
selecting an API. Our experimental results demonstrate that ToolCoder exhibits
excellent performance and generalization across five public and private library
code generation benchmarks, with at least 6.21% improvement on average pass@1
metrics and 9.64% improvement on average pass@10 metrics compared to
state-of-the-art methods. Furthermore, we show that our relatively small
ToolCoder model is comparable to one of the current best models, GPT-3.5,
highlighting the potential of incorporating programming tools into the code
generation process.

| Search Query: ArXiv Query: search_query=au:”Zhuo Li”&id_list=&start=0&max_results=3