Kavli Affiliate: Michael P. Brenner
| First 5 Authors: Haining Pan, Nayantara Mudur, Will Taranto, Maria Tikhanovskaya, Subhashini Venugopalan
| Summary:
Large language models (LLMs) have demonstrated an unprecedented ability to
perform complex tasks in multiple domains, including mathematical and
scientific reasoning. We demonstrate that with carefully designed prompts, LLMs
can accurately carry out key calculations in research papers in theoretical
physics. We focus on a broadly used approximation method in quantum physics:
the Hartree-Fock method, requiring an analytic multi-step calculation deriving
approximate Hamiltonian and corresponding self-consistency equations. To carry
out the calculations using LLMs, we design multi-step prompt templates that
break down the analytic calculation into standardized steps with placeholders
for problem-specific information. We evaluate GPT-4’s performance in executing
the calculation for 15 research papers from the past decade, demonstrating
that, with correction of intermediate steps, it can correctly derive the final
Hartree-Fock Hamiltonian in 13 cases and makes minor errors in 2 cases.
Aggregating across all research papers, we find an average score of 87.5 (out
of 100) on the execution of individual calculation steps. Overall, the
requisite skill for doing these calculations is at the graduate level in
quantum condensed matter theory. We further use LLMs to mitigate the two
primary bottlenecks in this evaluation process: (i) extracting information from
papers to fill in templates and (ii) automatic scoring of the calculation
steps, demonstrating good results in both cases. The strong performance is the
first step for developing algorithms that automatically explore theoretical
hypotheses at an unprecedented scale.
| Search Query: ArXiv Query: search_query=au:”Michael P. Brenner”&id_list=&start=0&max_results=3