Vector optimization with stochastic bandit feedback

Ararat, ÇağınTekin, CemRuiz, F.Dy J.Van de Meent, J-W.2024-03-082024-03-082023-03-072640-3498https://hdl.handle.net/11693/114415Conference Name: 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023Date of Conference: 25 April 2023 - 27 April 2023We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone C. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of (ϵ, δ)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the naïve elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how C and sampling budget affect the Pareto set, the returned (ϵ, δ)-PAC Pareto set, and the success of identification. Copyright © 2023 by the author(s)en-USArtificial intelligenceBudget controlStochastic systemsVector optimization with stochastic bandit feedbackConference Paper