Kavli Affiliate: Salman Habib | Summary:Large Language Models (LLM) are increasingly used for software development, yet existing benchmarks for LLM-based coding assistance do not reflect the constraints of High Energy Physics (HEP) and High Performance Computing (HPC) software. Code correctness must respect science constraints and changes must integrate into large, performance-critical codebases with complex dependencies […]
Continue.. CelloAI Benchmarks: Toward Repeatable Evaluation of AI Assistants