Kavli Affiliate: Xiang Zhang
| First 5 Authors: Tianchun Wang, Tianchun Wang, , ,
| Summary:
Large language model (LLM) scaling inference is key to unlocking greater
performance, and leveraging diversity has proven an effective way to enhance
it. Motivated by the observed relationship between solution accuracy and
meaningful response diversity, we systematically study the effect of prompt
diversity in scaling inference. We theoretically explain why diversified
sampling improves Best-of-$N$ scaling, showing that responses generated from
meaningful diverse prompts after Best-of-$N$ selection exhibit significantly
lower error rates than those produced from stationary prompts. To promote
solution diversity, we analyze perturbation fidelity and show that moderately
relevant perturbations improve performance, providing guidance for effective
perturbation design. Further, we present a set of effective perturbations,
including task-level and query-level ones, and analyze the conditions under
which they succeed. We systematically evaluate diversified sampling across
tasks, finding relative gains of 10.8% in EM@100 for reasoning, 9.6% for
mathematics, and 9.5% in Pass@100 for code generation.
| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3