Kavli Affiliate: Max Tegmark | First 5 Authors: Sergiu Bursuc, Sergiu Bursuc, , , | Summary: We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications – in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our benchmark contains 12,504 formal specifications, […]
Continue.. A benchmark for vericoding: formally verified program synthesis