Browsing by Subject "Multi-armed bandits"

Now showing 1 - 9 of 9

Open Access
Actionable intelligence and online learning for semantic computing
(World Scientific Publishing Company, 2017) Tekin, Cem; van der Schaar, M.
As the world becomes more connected and instrumented, high dimensional, heterogeneous and time-varying data streams are collected and need to be analyzed on the fly to extract the actionable intelligence from the data streams and make timely decisions based on this knowledge. This requires that appropriate classifiers are invoked to process the incoming streams and find the relevant knowledge. Thus, a key challenge becomes choosing online, at run-time, which classifier should be deployed to make the best possible predictions on the incoming streams. In this paper, we survey a class of methods capable to perform online learning in stream-based semantic computing tasks: multi-armed bandits (MABs). Adopting MABs for stream mining poses, numerous new challenges requires many new innovations. Most importantly, the MABs will need to explicitly consider and track online the time-varying characteristics of the data streams and to learn fast what is the relevant information out of the vast, heterogeneous and possibly highly dimensional data streams. In this paper, we discuss contextual MAB methods, which use similarities in context (meta-data) information to make decisions, and discuss their advantages when applied to stream mining for semantic computing. These methods can be adapted to discover in real-time the relevant contexts guiding the stream mining decisions, and tract the best classifier in presence of concept drift. Moreover, we also discuss how stream mining of multiple data sources can be performed by deploying cooperative MAB solutions and ensemble learning. We conclude the paper by discussing the numerous other advantages of MABs that will benefit semantic computing applications.
Open Access
Adaptive ensemble learning with confidence bounds
(Institute of Electrical and Electronics Engineers Inc., 2017) Tekin, C.; Yoon, J.; Schaar, M. V. D.
Extracting actionable intelligence from distributed, heterogeneous, correlated, and high-dimensional data sources requires run-time processing and learning both locally and globally. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long-run (asymptotic) and short-run (rate of learning) performance guarantees. Moreover, our approach yields performance guarantees with respect to the optimal local prediction strategy, and is also able to adapt its predictions in a data-driven manner. We illustrate the performance of Hedged Bandits in the context of medical informatics and show that it outperforms numerous online and offline ensemble learning methods.
Open Access
Aging wireless bandits: regret analysis and order-optimal learning algorithm
(IEEE, 2021-11-13) Atay, Eray Unsal; Kadota, Igor; Modiano, Eytan
We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time-slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over T time-slots. To analyze its performance, we introduce the notion of AoI-regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI-regret captures the penalty incurred by having to learn the statistics of the channels over the T time-slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as ϵ-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI-regret scales as Θ(log T); second, we develop a novel learning algorithm and show that it has O(1) regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI-regret.
Open Access
Contextual online learning for multimedia content aggregation
(Institute of Electrical and Electronics Engineers, 2015-04) Tekin, C.; Schaar, Mihaela van der
The last decade has witnessed a tremendous growth in the volume as well as the diversity of multimedia content generated by a multitude of sources (news agencies, social media, etc.). Faced with a variety of content choices, consumers are exhibiting diverse preferences for content; their preferences often depend on the context in which they consume content as well as various exogenous events. To satisfy the consumers’ demand for such diverse content, multimedia content aggregators (CAs) have emerged which gather content from numerous multimedia sources. A key challenge for such systems is to accurately predict what type of content each of its consumers prefers in a certain context, and adapt these predictions to the evolving consumers’ preferences, contexts, and content characteristics. We propose a novel, distributed, online multimedia content aggregation framework, which gathers content generated by multiple heterogeneous producers to fulfill its consumers’ demand for content. Since both the multimedia content characteristics and the consumers’ preferences and contexts are unknown, the optimal content aggregation strategy is unknown a priori. Our proposed content aggregation algorithm is able to learn online what content to gather and how to match content and users by exploiting similarities between consumer types. We prove bounds for our proposed learning algorithms that guarantee both the accuracy of the predictions as well as the learning speed. Importantly, our algorithms operate efficiently even when feedback from consumers is missing or content and preferences evolve over time. Illustrative results highlight the merits of the proposed content aggregation system in a variety of settings.
Open Access
Decentralized dynamic rate and channel selection over a shared spectrum
(IEEE, 2021-03-15) Javanmardi, Alireza; Qureshi, Muhammad Anjum; Tekin, Cem
We consider the problem of distributed dynamic rate and channel selection in a multi-user network, in which each user selects a wireless channel and a modulation and coding scheme (corresponds to a transmission rate) in order to maximize the network throughput. We assume that the users are cooperative, however, there is no coordination and communication among them, and the number of users in the system is unknown. We formulate this problem as a multi-player multi-armed bandit problem and propose a decentralized learning algorithm that performs almost optimal exploration of the transmission rates to learn fast. We prove that the regret of our learning algorithm with respect to the optimal allocation increases logarithmically over rounds with a leading term that is logarithmic in the number of transmission rates. Finally, we compare the performance of our learning algorithm with the state-of-the-art via simulations and show that it substantially improves the throughput and minimizes the number of collisions.
Open Access
Fully distributed bandit algorithm for the joint channel and rate selection problem in heterogeneous cognitive radio networks
(2020-12) Javanmardi, Alireza
We consider the problem of the distributed sequential channel and rate selection in cognitive radio networks where multiple users choose channels from the same set of available wireless channels and pick modulation and coding schemes (corresponds to transmission rates). In order to maximize the network throughput, users need to be cooperative while communication among them is not allowed. Also, if multiple users select the same channel simultaneously, they collide, and none of them would be able to use the channel for transmission. We rigorously formulate this resource allocation problem as a multi-player multi-armed bandit problem and propose a decentralized learning algorithm called Game of Thrones with Sequential Halving Orthogonal Exploration (GoT-SHOE). The proposed algorithm keeps the number of collisions in the network as low as possible and performs almost optimal exploration of the transmission rates to speed up the learning process. We prove our learning algorithm achieves a regret with respect to the optimal allocation that grows logarithmically over rounds with a leading term that is logarithmic in the number of transmission rates. We also propose an extension of our algorithm which works when the number of users is greater than the number of channels. Moreover, we discuss that Sequential Halving Orthogonal Exploration can indeed be used with any distributed channel assignment algorithm and enhance its performance. Finally, we provide extensive simulations and compare the performance of our learning algorithm with the state-of-the-art which demonstrates the superiority of the proposed algorithm in terms of better system throughput and lower number of collisions.
Open Access
Multi-armed bandit algorithms for communication networks and healthcare
(2022-06) Demirel, İlker
Multi-armed bandits (MAB) is a well-established sequential decision-making framework. While the simplest MAB framework is useful in modeling a wide range of real-world applications ranging from adaptive clinical trial design to financial portfolio management, it requires further extensions for other problems. We propose three novel MAB algorithms that are useful in optimizing bolus-insulin dose recommendation in type-1 diabetes, best channel identification in cognitive radio networks, and online recommender systems. First, we introduce and study the “safe leveling” problem, where the learner's objective is to keep the arm outcomes close to a target level rather than maximize them. We propose a novel algorithm, ESCADA, with cumulative regret and safety guarantees. We demonstrate its effectiveness against the straightforward adaptations of standard MAB algorithms to the “leveling task”. Next, we study the “federated multi-armed bandit” (FMAB) problem, where a cohort of clients play the same MAB game to learn the globally best arm. We consider adversarial “Byzantine” clients disturbing the learning process with false model updates and propose a robust algorithm, Fed-MoM-UCB. We provide theoretical guarantees on Fed-MoM-UCB while identifying the certain performance sacrifices that robustness requires. Finally, we study the “combinatorial multi-armed bandits with probabilistically triggered arms” (CMAB-PTA), where the learner chooses a set of arms at each round that may trigger other arms. CMAB-PTA is useful in modeling various problems such as influence maximization on graphs and online recommendation systems. We propose a Gaussian process-based algorithm, ComGP-UCB. We provide upper bounds on its regret and demonstrate its effectiveness against the state-of-the-art baselines when arm outcomes are correlated.
Open Access
Robust optimization of multi-objective multi-armed bandits with contaminated bandit feedback
(2022-06) Bozgan, Kerem
Multi-objective multi-armed bandits (MO-MAB) is an important extension of the standard MAB problem that has found a wide variety of applications ranging from clinical trials to online recommender systems. We consider Pareto set identification problem in the adversarial MO-MAB setting, where at each arm pull, with probability ϵ ∈ (0,1/2), an adversary corrupts the reward samples by replacing the true samples with the samples from an arbitrary distribution of its choosing. Existing MO-MAB methods in the literature are incapable of handling such attacks unless there are strict restrictions on the contamination distributions. As a result, these methods perform poorly in practice where such restrictions on the adversary are not valid in general. To fill this gap in the literature, we propose two different robust, median-based optimization methods that can approximate the Pareto optimal set from contaminated samples. We prove a sample complexity bound of the form O(1/α^2 log(1/δ)) for the proposed methods, where α>0 and δ ∈ (0,1) are accuracy and confidence parameters, respectively, that can be set by the user according to his/her preference. This bound matches, in the worst case, the bounds from [1, Theorem 4] and [2, Theorem 3] that consider the adversary free setting. We compare the proposed methods with a mean-based method from the MO-MAB literature on real-world and synthetic experiments. Numerical results verify our theoretical expectations and show the importance of robust algorithm design in the adversarial setting.
Open Access
Thompson sampling for combinatorial network optimization in unknown environments
(IEEE, 2020) Hüyük, Alihan; Tekin, Cem
Influence maximization, adaptive routing, and dynamic spectrum allocation all require choosing the right action from a large set of alternatives. Thanks to the advances in combinatorial optimization, these and many similar problems can be efficiently solved given an environment with known stochasticity. In this paper, we take this one step further and focus on combinatorial optimization in unknown environments. We consider a very general learning framework called combinatorial multi-armed bandit with probabilistically triggered arms and a very powerful Bayesian algorithm called Combinatorial Thompson Sampling (CTS). Under the semi-bandit feedback model and assuming access to an oracle without knowing the expected base arm outcomes beforehand, we show that when the expected reward is Lipschitz continuous in the expected base arm outcomes CTS achieves O(∑mi=1logT/(piΔi)) regret and O(max{E[mTlogT/p∗−−−−−−−−√],E[m2/p∗]}) Bayesian regret, where m denotes the number of base arms, pi and Δi denote the minimum non-zero triggering probability and the minimum suboptimality gap of base arm i respectively, T denotes the time horizon, and p∗ denotes the overall minimum non-zero triggering probability. We also show that when the expected reward satisfies the triggering probability modulated Lipschitz continuity, CTS achieves O(max{mTlogT−−−−−−√,m2}) Bayesian regret, and when triggering probabilities are non-zero for all base arms, CTS achieves O(1/p∗log(1/p∗)) regret independent of the time horizon. Finally, we numerically compare CTS with algorithms based on upper confidence bounds in several networking problems and show that CTS outperforms these algorithms by at least an order of magnitude in majority of the cases.