Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank

Kavli Affiliate: Zhuo Li

| First 5 Authors: Mouxiang Chen, Chenghao Liu, Zemin Liu, Zhuo Li, Jianling Sun

| Summary:

The application of Unbiased Learning to Rank (ULTR) is widespread in modern
systems for training unbiased ranking models from biased click logs. The key is
to explicitly model a generation process for user behavior and fit click data
based on examination hypothesis. Previous research found empirically that the
true latent relevance can be recovered in most cases as long as the clicks are
perfectly fitted. However, we demonstrate that this is not always achievable,
resulting in a significant reduction in ranking performance. In this work, we
aim to answer if or when the true relevance can be recovered from click data,
which is a foundation issue for ULTR field. We first define a ranking model as
identifiable if it can recover the true relevance up to a scaling
transformation, which is enough for pairwise ranking objective. Then we explore
an equivalent condition for identifiability that can be novely expressed as a
graph connectivity test problem: if and only if a graph (namely identifiability
graph, or IG) constructed on the underlying structure of the dataset is
connected, we can guarantee that the relevance can be correctly recovered. When
the IG is not connected, there may be bad cases leading to poor ranking
performance. To address this issue, we propose two methods, namely node
intervention and node merging, to modify the dataset and restore connectivity
of the IG. Empirical results obtained on a simulation dataset and two LTR
benchmark datasets confirm the validity of our proposed theorems and show the
effectiveness of our methods in mitigating data bias when the relevance model
is unidentifiable.

| Search Query: ArXiv Query: search_query=au:”Zhuo Li”&id_list=&start=0&max_results=3