Bilkent Repository :: Browsing by Subject "Contextual bandits"

Browsing by Subject "Contextual bandits"

Now showing 1 - 9 of 9

Open Access
Adaptive ensemble learning with confidence bounds
(Institute of Electrical and Electronics Engineers Inc., 2017) Tekin, C.; Yoon, J.; Schaar, M. V. D.
Extracting actionable intelligence from distributed, heterogeneous, correlated, and high-dimensional data sources requires run-time processing and learning both locally and globally. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long-run (asymptotic) and short-run (rate of learning) performance guarantees. Moreover, our approach yields performance guarantees with respect to the optimal local prediction strategy, and is also able to adapt its predictions in a data-driven manner. We illustrate the performance of Hedged Bandits in the context of medical informatics and show that it outperforms numerous online and offline ensemble learning methods.
Open Access
Asymptotically optimal contextual bandit algorithm using hierarchical structures
(Institute of Electrical and Electronics Engineers, 2018) Neyshabouri, Mohammadreza Mohaghegh; Gökçesu, Kaan; Gökçesu, Hakan; Özkan, Hüseyin; Kozat, Süleyman Serdar
We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. Our approach is to partition the context space and, then, optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data-driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithm based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design an efficient implementation for our algorithm using various hierarchical partitioning structures, such as lexicographical or arbitrary position splitting and binary trees (BTs) (and several other partitioning examples). For instance, in the case of BT partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art. Our experimental work extensively covers various scenarios ranging from bandit settings to multiclass classification with real and synthetic data. In these experiments, we show that our algorithm is highly superior to the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability. IEEE
Open Access
Distributed online learning via cooperative contextual bandits
(Institute of Electrical and Electronics Engineers, 2015-07-15) Tekin, C.; Schaar, Mihaela van der
In this paper, we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions—taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret—the loss incurred by the algorithm—is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.
Open Access
Multi-objective contextual bandits with a dominant objective
(IEEE, 2017) Tekin, Cem; Turgay, Eralp
In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.
Open Access
Multi-user small base station association via contextual combinatorial volatile bandits
(IEEE, 2021-03-09) Qureshi, Muhammad Anjum; Nika, Andi; Tekin, Cem
We propose an efficient mobility management solution to the problem of assigning small base stations (SBSs) to multiple mobile data users in a heterogeneous setting. We formalize the problem using a novel sequential decision-making model named contextual combinatorial volatile multi-armed bandits (MABs), in which each association is considered as an arm, volatility of an arm is imposed by the dynamic arrivals of the users, and context is the additional information linked with the user and the SBS such as user/SBS distance and the transmission frequency. As the next-generation communications are envisioned to take place over highly dynamic links such as the millimeter wave (mmWave) frequency band, we consider the association problem over an unknown channel distribution with a limited feedback in the form of acknowledgments and under the absence of channel state information (CSI). As the links are unknown and dynamically varying, the assignment problem cannot be solved offline. Thus, we propose an online algorithm which is able to solve the user-SBS association problem in a multi-user and time-varying environment, where the number of users dynamically varies over time. Our algorithm strikes the balance between exploration and exploitation and achieves sublinear in time regret with an optimal dependence on the problem structure and the dynamics of user arrivals and departures. In addition, we demonstrate via numerical experiments that our algorithm achieves significant performance gains compared to several benchmark algorithms.
Open Access
Multiagent systems: learning, strategic behavior, cooperation, and network formation
(Elsevier, 2018) Tekin, Cem; Zhang, S.; Xu, J.; Schaar, M. van der; Djurić, P. M.; Richard., C.
Many applications ranging from crowdsourcing to recommender systems involve informationally decentralized agents repeatedly interacting with each other in order to reach their goals. These networked agents base their decisions on incomplete information, which they gather through interactions with their neighbors or through cooperation, which is often costly. This chapter presents a discussion on decentralized learning algorithms that enable the agents to achieve their goals through repeated interaction. First, we discuss cooperative online learning algorithms that help the agents to discover beneficial connections with each other and exploit these connections to maximize the reward. For this case, we explain the relation between the learning speed, network topology, and cooperation cost. Then, we focus on how informationally decentralized agents form cooperation networks through learning. We explain how learning features prominently in many real-world interactions, and greatly affects the evolution of social networks. Links that otherwise would not have formed may now appear, and a much greater variety of network configurations can be reached. We show that the impact of learning on efficiency and social welfare could be both positive or negative. We also demonstrate the use of the aforementioned methods in popularity prediction, recommender systems, expert selection, and multimedia content aggregation.
Open Access
Online classification with contextual exponential weights for disease diagnostics
(IEEE, 2017) Ekşioğlu, Kubilay; Qureshi, Muhammad Anjum; Tekin, Cem
In this paper, a novel online scheme for classification, which is based on the contextual-variant of Weighted Average Forecaster Algorithm is proposed. The proposed method adaptively partitions the data space based on contexts, and tradeoffs exploration and exploitation when fusing the predictions of the experts. The proposed algorithm is verified on disease data available in UCI Online Machine Learning Repository. These results prove the robustness, effectiveness and versatility in terms of performance and low computational cost of the proposed system in the field of medical diagnostics.
Open Access
RELEAF: an algorithm for learning and exploiting relevance
(Cornell University, 2015-02) Tekin, C.; Schaar, Mihaela van der
Recommender systems, medical diagnosis, network security, etc., require on-going learning and decision-making in real time. These -- and many others -- represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating what is learned is subject to the curse of dimensionality. This paper develops and analyzes algorithms that allow efficient learning and decision-making while avoiding the curse of dimensionality. We formalize the information available to the learner/decision-maker at a particular time as a context vector which the learner should consider when taking actions. In general the context vector is very high dimensional, but in many settings, the most relevant information is embedded into only a few relevant dimensions. If these relevant dimensions were known in advance, the problem would be simple -- but they are not. Moreover, the relevant dimensions may be different for different actions. Our algorithm learns the relevant dimensions for each action, and makes decisions based in what it has learned. Formally, we build on the structure of a contextual multi-armed bandit by adding and exploiting a relevance relation. We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to O~(T2(2√−1)); in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret O~(T(D+1)/(D+2)), where D is the full dimension of the context vector.
Open Access
User feedback-based online learning for intent classification
(Association for Computing Machinery, 2023-10-09) Gönç, Kaan; Sağlam, Baturay; Dalmaz, Onat; Çukur, Tolga; Kozat, Serdar; Dibeklioğlu, Hamdi
Intent classifcation is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classifcation methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overft or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: ofine pretraining and online fne-tuning. In the ofine stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fne-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efciently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method signifcantly outperforms policies that omit either offine pretraining or online fne-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.

Browsing by Subject "Contextual bandits"

Results Per Page

Sort Options