Kavli Affiliate: Debanjan Chowdhury | First 5 Authors: Haining Pan, Haining Pan, , , | Summary: Large language models (LLMs) have shown remarkable progress in coding and math problem-solving, but evaluation on advanced research-level problems in hard sciences remains scarce. To fill this gap, we present CMT-Benchmark, a dataset of 50 problems covering condensed matter […]
Continue.. CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers