Browsing by Author "Schaar, M. V. D."
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access Adaptive ensemble learning with confidence bounds(Institute of Electrical and Electronics Engineers Inc., 2017) Tekin, C.; Yoon, J.; Schaar, M. V. D.Extracting actionable intelligence from distributed, heterogeneous, correlated, and high-dimensional data sources requires run-time processing and learning both locally and globally. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long-run (asymptotic) and short-run (rate of learning) performance guarantees. Moreover, our approach yields performance guarantees with respect to the optimal local prediction strategy, and is also able to adapt its predictions in a data-driven manner. We illustrate the performance of Hedged Bandits in the context of medical informatics and show that it outperforms numerous online and offline ensemble learning methods.Item Open Access Generalized global bandit and its application in cellular coverage optimization(Institute of Electrical and Electronics Engineers, 2018) Shen, C.; Zhou, R.; Tekin, Cem; Schaar, M. V. D.Motivated by the engineering problem of cellular coverage optimization, we propose a novel multiarmed bandit model called generalized global bandit. We develop a series of greedy algorithms that have the capability to handle nonmonotonic but decomposable reward functions, multidimensional global parameters, and switching costs. The proposed algorithms are rigorously analyzed under the multiarmed bandit framework, where we show that they achieve bounded regret, and hence, they are guaranteed to converge to the optimal arm in finite time. The algorithms are then applied to the cellular coverage optimization problem to achieve the optimal tradeoff between sufficient small cell coverage and limited macroleakage without prior knowledge of the deployment environment. The performance advantage of the new algorithms over existing bandits solutions is revealed analytically and further confirmed via numerical simulations. The key element behind the performance improvement is a more efficient 'trial and error' mechanism, in which any trial will help improve the knowledge of all candidate power levels.Item Open Access Global bandits(Institute of Electrical and Electronics Engineers, 2018) Atan, O.; Tekin, Cem; Schaar, M. V. D.Multiarmed bandits (MABs) model sequential decision-making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior works on MAB assume that the reward distributions of each arm are independent. But in a wide variety of decision problems - from drug dosage to dynamic pricing - the expected rewards of different arms are correlated, so that selecting one arm provides information about the expected rewards of other arms as well. We propose and analyze a class of models of such decision problems, which we call global bandits (GB). In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves bounded regret, with a bound that depends on the single true parameter of the problem. Hence, this policy selects suboptimal arms only finitely many times with probability one. For this case, we also obtain a bound on regret that is independent of the true parameter; this bound is sublinear, with an exponent that depends on the informativeness of the arms. We also propose a variant of the greedy policy that achieves O(√T) worst case and O(1) parameter-dependent regret. Finally, we perform experiments on dynamic pricing and show that the proposed algorithms achieve significant gains with respect to the well-known benchmarks.