Measuring the Inconsistency of Large Language Models in Preferential Ranking

Kavli Affiliate: Ke Wang

| First 5 Authors: Xiutian Zhao, Ke Wang, Wei Peng, ,

| Summary:

Despite large language models’ (LLMs) recent advancements, their bias and
hallucination issues persist, and their ability to offer consistent
preferential rankings remains underexplored. This study investigates the
capacity of LLMs to provide consistent ordinal preferences, a crucial aspect in
scenarios with dense decision space or lacking absolute answers. We introduce a
formalization of consistency based on order theory, outlining criteria such as
transitivity, asymmetry, reversibility, and independence from irrelevant
alternatives. Our diagnostic experiments on selected state-of-the-art LLMs
reveal their inability to meet these criteria, indicating a strong positional
bias and poor transitivity, with preferences easily swayed by irrelevant
alternatives. These findings highlight a significant inconsistency in
LLM-generated preferential rankings, underscoring the need for further research
to address these limitations.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3