Kavli Affiliate: David Muller | First 5 Authors: Emerson Melo, Emerson Melo, , , | Summary: We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) […]
Continue.. Beyond Softmax: A New Perspective on Gradient Bandits