Beyond Softmax: A New Perspective on Gradient Bandits – Kavli Institute Pre-Print Publications

Kavli Affiliate: David Muller

| First 5 Authors: Emerson Melo, Emerson Melo, , ,

| Summary:

We establish a link between a class of discrete choice models and the theory
of online learning and multi-armed bandits. Our contributions are: (i)
sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a
special case; (ii) a new class of adversarial bandit algorithms derived from
generalized nested logit models citepwen:2001; and (iii)
textcolorblackwe introduce a novel class of generalized gradient bandit
algorithms that extends beyond the widely used softmax formulation. By relaxing
the restrictive independence assumptions inherent in softmax, our framework
accommodates correlated learning dynamics across actions, thereby broadening
the applicability of gradient bandit methods. Overall, the proposed algorithms
combine flexible model specification with computational efficiency via
closed-form sampling probabilities. Numerical experiments in stochastic bandit
settings demonstrate their practical effectiveness.

| Search Query: ArXiv Query: search_query=au:”David Muller”&id_list=&start=0&max_results=3