Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits; Zifan Li, Ambuj Tewari

Recent work on follow the perturbed leader (FTPL) algorithms for
the adversarial multi-armed bandit problem has highlighted the
role of the hazard rate of the distribution generating the
perturbations. Assuming that the hazard rate is bounded, it is
possible to provide regret analyses for a variety of FTPL
algorithms for the multi-armed bandit problem. This paper pushes
the inquiry into regret bounds for FTPL algorithms beyond the
bounded hazard rate condition. There are good reasons to do so:
natural distributions such as the uniform and Gaussian violate
the condition. We give regret bounds for both bounded support
and unbounded support distributions without assuming the hazard
rate condition. We also disprove a conjecture that the Gaussian
distribution cannot lead to a low-regret algorithm. In fact, it
turns out that it leads to near optimal regret, up to
logarithmic factors. A key ingredient in our approach is the
introduction of a new notion called the generalized hazard rate.


Note from Journals.Today : This content has been auto-generated from a syndicated feed.