Kavli Affiliate: Max Tegmark | First 5 Authors: Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun | Summary: We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough […]
Continue.. DafnyBench: A Benchmark for Formal Software Verification