Kavli Affiliate: Li Xin Li | First 5 Authors: Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao | Summary: Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abilities. This dual limitation makes it challenging to comprehensively assess LLMs’ […]
Continue.. MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation