Browsing by Author "Tekin, Cem"
Now showing 1 - 20 of 38
Results Per Page
Sort Options
Item Open Access Actionable intelligence and online learning for semantic computing(World Scientific Publishing Company, 2017) Tekin, Cem; van der Schaar, M.As the world becomes more connected and instrumented, high dimensional, heterogeneous and time-varying data streams are collected and need to be analyzed on the fly to extract the actionable intelligence from the data streams and make timely decisions based on this knowledge. This requires that appropriate classifiers are invoked to process the incoming streams and find the relevant knowledge. Thus, a key challenge becomes choosing online, at run-time, which classifier should be deployed to make the best possible predictions on the incoming streams. In this paper, we survey a class of methods capable to perform online learning in stream-based semantic computing tasks: multi-armed bandits (MABs). Adopting MABs for stream mining poses, numerous new challenges requires many new innovations. Most importantly, the MABs will need to explicitly consider and track online the time-varying characteristics of the data streams and to learn fast what is the relevant information out of the vast, heterogeneous and possibly highly dimensional data streams. In this paper, we discuss contextual MAB methods, which use similarities in context (meta-data) information to make decisions, and discuss their advantages when applied to stream mining for semantic computing. These methods can be adapted to discover in real-time the relevant contexts guiding the stream mining decisions, and tract the best classifier in presence of concept drift. Moreover, we also discuss how stream mining of multiple data sources can be performed by deploying cooperative MAB solutions and ensemble learning. We conclude the paper by discussing the numerous other advantages of MABs that will benefit semantic computing applications.Item Open Access Adaptive contextual learning for unit commitment in microgrids with renewable energy sources(Institute of Electrical and Electronics Engineers, 2018) Lee, H. -S.; Tekin, Cem; van der, Schaar, M.; Lee, J. -W.In this paper, we study a unit commitment (UC) problem where the goal is to minimize the operating costs of a microgrid that involves renewable energy sources. Since traditional UC algorithms use a priori information about uncertainties such as the load demand and the renewable power outputs, their performances highly depend on the accuracy of the a priori information, especially in microgrids due to their limited scale and size. This makes the algorithms impractical in settings where the past data are not sufficient to construct an accurate prior of the uncertainties. To resolve this issue, we develop an adaptively partitioned contextual learning algorithm for UC (AP-CLUC) that learns the best UC schedule and minimizes the total cost over time in an online manner without requiring any a priori information. AP-CLUC effectively learns the effects of the uncertainties on the cost by adaptively considering context information strongly correlated with the uncertainties, such as the past load demand and weather conditions. For AP-CLUC, we first prove an analytical bound on the performance, which shows that its average total cost converges to that of the optimal policy with perfect a priori information. Then, we show via simulations that AP-CLUC achieves competitive performance with respect to the traditional UC algorithms with perfect a priori information, and it achieves better performance than them even with small errors on the information. These results demonstrate the effectiveness of utilizing the context information and the adaptive management of the past data for the UC problem.Item Open Access Adaptive ensemble learning with confidence bounds for personalized diagnosis(AAAI Press, 2016) Tekin, Cem; Yoon, J.; Van Der Schaar, M.With the advances in the field of medical informatics, automated clinical decision support systems are becoming the de facto standard in personalized diagnosis. In order to establish high accuracy and confidence in personalized diagnosis, massive amounts of distributed, heterogeneous, correlated and high-dimensional patient data from different sources such as wearable sensors, mobile applications, Electronic Health Record (EHR) databases etc. need to be processed. This requires learning both locally and globally due to privacy constraints and/or distributed nature of the multimodal medical data. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally-collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long run (asymptotic) and short run (rate of learning) performance guarantees. Moreover, we show that our proposed method outperforms all existing ensemble learning techniques, even in the presence of concept drift.Item Open Access Analysis of thompson sampling for combinatorial multi-armed bandit with probabilistically triggered arms(PLMR, 2020) Hüyük, Alihan; Tekin, CemWe analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting. We assume that the learner has access to an exact optimization oracle but does not know the expected base arm outcomes beforehand. When the expected reward function is Lipschitz continuous in the expected base arm outcomes, we derive O( Pm i=1 log T /(pii)) regret bound for CTS, where m denotes the number of base arms, pi denotes the minimum non-zero triggering probability of base arm i and i denotes the minimum suboptimality gap of base arm i. We also compare CTS with combinatorial upper confidence bound (CUCB) via numerical experiments on a cascading bandit problem.Item Open Access The biobjective multiarmed bandit: learning approximate lexicographic optimal allocations(TÜBİTAK, 2019) Tekin, CemWe consider a biobjective sequential decision-making problem where an allocation (arm) is called ϵ lexicographic optimal if its expected reward in the first objective is at most ϵ smaller than the highest expected reward, and its expected reward in the second objective is at least the expected reward of a lexicographic optimal arm. The goal of the learner is to select arms that are ϵ lexicographic optimal as much as possible without knowing the arm reward distributions beforehand. For this problem, we first show that the learner’s goal is equivalent to minimizing the ϵ lexicographic regret, and then, propose a learning algorithm whose ϵ lexicographic gap-dependent regret is bounded and gap-independent regret is sublinear in the number of rounds with high probability. Then, we apply the proposed model and algorithm for dynamic rate and channel selection in a cognitive radio network with imperfect channel sensing. Our results show that the proposed algorithm is able to learn the approximate lexicographic optimal rate–channel pair that simultaneously minimizes the primary user interference and maximizes the secondary user throughput.Item Open Access Combinatorial Gaussian process bandits with probabilistically triggered arms(Microtome Publishing, 2021) Demirel, İlker; Tekin, CemCombinatorial bandit models and algorithms are used in many sequential decision-making tasks ranging from item list recommendation to influence maximization. Typical algorithms proposed for combinatorial bandits, including combinatorial UCB (CUCB) and combinatorial Thompson sampling (CTS) do not exploit correlations between base arms during the learning process. Moreover, their regret is usually analyzed under independent base arm outcomes. In this paper, we use Gaussian Processes (GPs) to model correlations between base arms. In particular, we consider a combinatorial bandit model with probabilistically triggered arms, and assume that the expected base arm outcome function is a sample from a GP. We assume that the learner has access to an exact computation oracle, which returns an optimal solution given expected base arm outcomes, and analyze the regret of Combinatorial Gaussian Process Upper Confidence Bound (ComGP-UCB) algorithm for this setting. Under (triggering probability modulated) Lipschitz continuity assumption on the expected reward function, we derive (O( \sqrt{m T \log T \gamma_{T, \boldsymbol{\mu}}^{PTA}})) O(m \sqrt{\frac{T \log T}{p^*}}) upper bounds for the regret of ComGP-UCB that hold with high probability, where m denotes the number of base arms, p^* denotes the minimum non-zero triggering probability, and \gamma_{T, \boldsymbol{\mu}}^{PTA} denotes the pseudo-information gain. Finally, we show via simulations that when the correlations between base arm outcomes are strong, ComGP-UCB significantly outperforms CUCB and CTS.Item Open Access Combinatorial multi-armed bandit problem with probabilistically triggered arms: a case with bounded regret(IEEE, 2017-11) Sarıtaç, A. Ömer; Tekin, CemIn this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a simple greedy policy, named greedy CMAB (G-CMAB), achieves bounded regret. This improves the result in previous work, which shows that the regret is O (log T) under no such assumption on the ATPs. Then, we numerically show that G-CMAB achieves bounded regret in a real-world movie recommendation problem, where the action corresponds to recommending a set of movies, arms correspond to the edges between movies and users, and the goal is to maximize the total number of users that are attracted by at least one movie. In addition to this problem, our results directly apply to the online influence maximization (OIM) problem studied in numerous prior works.Item Open Access Conservative policy construction using variational autoencoders for logged data with missing values(Institute of Electrical and Electronics Engineers Inc., 2022-01-10) Abroshan, M.; Yip, K. H.; Tekin, Cem; Van Der Schaar, M.In high-stakes applications of data-driven decision-making such as healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. First, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Second, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this article, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when ~X, a degraded version of Xwith missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the conservative strategy where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy, we need to estimate posterior distribution p(X|~X) and use a variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAEs) that are designed to capture the underlying structure of features with missing values.Item Open Access Context-aware hierarchical online learning for performance maximization in mobile crowdsourcing(Institute of Electrical and Electronics Engineers, 2018) Muller, S. K.; Tekin, Cem; Schaar, M.; Klein, A.In mobile crowdsourcing (MCS), mobile users accomplish outsourced human intelligence tasks. MCS requires an appropriate task assignment strategy, since different workers may have different performance in terms of acceptance rate and quality. Task assignment is challenging, since a worker's performance 1) may fluctuate, depending on both the worker's current personal context and the task context and 2) is not known a priori, but has to be learned over time. Moreover, learning context-specific worker performance requires access to context information, which may not be available at a central entity due to communication overhead or privacy concerns. In addition, evaluating worker performance might require costly quality assessments. In this paper, we propose a context-aware hierarchical online learning algorithm addressing the problem of performance maximization in MCS. In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes the worker's context, her/his decisions to accept or decline tasks and the quality in completing tasks. Based on these observations, the LC regularly estimates the worker's context-specific performance. The mobile crowdsourcing platform (MCSP) then selects workers based on performance estimates received from the LCs. This hierarchical approach enables the LCs to learn context-specific worker performance and it enables the MCSP to select suitable workers. In addition, our algorithm preserves worker context locally, and it keeps the number of required quality assessments low. We prove that our algorithm converges to the optimal task assignment strategy. Moreover, the algorithm outperforms simpler task assignment strategies in experiments based on synthetic and real data.Item Open Access Contextual learning for unit commitment with renewable energy sources(IEEE, 2017) Lee, H. -S.; Tekin, Cem; Schaar, M.; Lee, J. -W.In this paper, we study a unit commitment (UC) problem minimizing operating costs of the power system with renewable energy sources. We develop a contextual learning algorithm for UC (CLUC) which learns which UC schedule to choose based on the context information such as past load demand and weather condition. CLUC does not require any prior knowledge on the uncertainties such as the load demand and the renewable power outputs, and learns them over time using the context information. We characterize the performance of CLUC analytically, and prove its optimality in terms of the long-term average cost. Through the simulation results, we show the performance of CLUC and the effectiveness of utilizing the context information in the UC problem.Item Open Access Decentralized dynamic rate and channel selection over a shared spectrum(IEEE, 2021-03-15) Javanmardi, Alireza; Qureshi, Muhammad Anjum; Tekin, CemWe consider the problem of distributed dynamic rate and channel selection in a multi-user network, in which each user selects a wireless channel and a modulation and coding scheme (corresponds to a transmission rate) in order to maximize the network throughput. We assume that the users are cooperative, however, there is no coordination and communication among them, and the number of users in the system is unknown. We formulate this problem as a multi-player multi-armed bandit problem and propose a decentralized learning algorithm that performs almost optimal exploration of the transmission rates to learn fast. We prove that the regret of our learning algorithm with respect to the optimal allocation increases logarithmically over rounds with a leading term that is logarithmic in the number of transmission rates. Finally, we compare the performance of our learning algorithm with the state-of-the-art via simulations and show that it substantially improves the throughput and minimizes the number of collisions.Item Open Access eTutor: online learning for personalized education(IEEE, 2015-04) Tekin, Cem; Braun, J.; Schaar, Mihaela van derGiven recent advances in information technology and artificial intelligence, web-based education systems have became complementary and, in some cases, viable alternatives to traditional classroom teaching. The popularity of these systems stems from their ability to make education available to a large demographics (see MOOCs). However, existing systems do not take advantage of the personalization which becomes possible when web-based education is offered: they continue to be one-size-fits-all. In this paper, we aim to provide a first systematic method for designing a personalized web-based education system. Personalizing education is challenging: (i) students need to be provided personalized teaching and training depending on their contexts (e.g. classes already taken, methods of learning preferred, etc.), (ii) for each specific context, the best teaching and training method (e.g type and order of teaching materials to be shown) must be learned, (iii) teaching and training should be adapted online, based on the scores/feedback (e.g. tests, quizzes, final exam, likes/dislikes etc.) of the students. Our personalized online system, e-Tutor, is able to address these challenges by learning how to adapt the teaching methodology (in this case what sequence of teaching material to present to a student) to maximize her performance in the final exam, while minimizing the time spent by the students to learn the course (and possibly dropouts). We illustrate the efficiency of the proposed method on a real-world eTutor platform which is used for remedial training for a Digital Signal Processing (DSP) course. © 2015 IEEE.Item Open Access Exploiting relevance for online decision-making in high-dimensions(IEEE, 2020) Turgay, Eralp; Bulucu, Cem; Tekin, CemMany sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage sparsity in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set, possibly in a non-linear way. We depart from the prior work by assuming a high-dimensional, continuum set of arms, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called CMAB with Relevance Learning (CMAB-RL). CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on the number of dimensions dx and da of the context and arm sets. Importantly, we show that when the learner has prior knowledge on sparsity, given in terms of upper bounds d¯¯¯x and d¯¯¯a on the number of relevant context and arm dimensions, then CMAB-RL achieves O~(T1−1/(2+2d¯¯¯x+d¯¯¯a)) regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task.Item Open Access Fast learning for dynamic resource allocation in AI-Enabled radio networks(IEEE, 2020) Qureshi, Muhammad Anjum; Tekin, CemArtificial Intelligence (AI)-enabled radios are expected to enhance the spectral efficiency of 5th generation (5G) millimeter wave (mmWave) networks by learning to optimize network resources. However, allocating resources over the mmWave band is extremely challenging due to rapidly-varying channel conditions. We consider several resource allocation problems for mmWave radio networks under unknown channel statistics and without any channel state information (CSI) feedback: i) dynamic rate selection for an energy harvesting transmitter, ii) dynamic power allocation for heterogeneous applications, and iii) distributed resource allocation in a multi-user network. All of these problems exhibit structured payoffs which are unimodal functions over partially ordered arms (transmission parameters) as well as over partially ordered contexts (side-information). Unimodality over arms helps in reducing the number of arms to be explored, while unimodality over contexts helps in using past information from nearby contexts to make better selections. We model this as a structured reinforcement learning problem, called contextual unimodal multi-armed bandit (MAB), and propose an online learning algorithm that exploits unimodality to optimize the resource allocation over time, and prove that it achieves logarithmic in time regret. Our algorithm's regret scales sublinearly both in the number of arms and contexts for a wide range of scenarios. We also show via simulations that our algorithm significantly improves the performance in the aforementioned resource allocation problems.Item Open Access Feedback adaptive learning for medical and educational application recommendation(IEEE, 2020) Tekin, Cem; Elahi, Sepehr; Van Der Schaar, M.Recommending applications (apps) to improve health or educational outcomes requires long-term planning and adaptation based on the user feedback, as it is imperative to recommend the right app at the right time to improve engagement and benefit. We model the challenging task of app recommendation for these specific categories of apps-or alike-using a new reinforcement learning method referred to as episodic multi-armed bandit (eMAB). In eMAB, the learner recommends apps to individual users and observes their interactions with the recommendations on a weekly basis. It then uses this data to maximize the total payoff of all users by learning to recommend specific apps. Since computing the optimal recommendation sequence is intractable, as a benchmark, we define an oracle that sequentially recommends apps to maximize the expected immediate gain. Then, we propose our online learning algorithm, named FeedBack Adaptive Learning (FeedBAL), and prove that its regret with respect to the benchmark increases logarithmically in expectation. We demonstrate the effectiveness of FeedBAL on recommending mental health apps based on data from an app suite and show that it results in a substantial increase in the number of app sessions compared with episodic versions of ϵn -greedy, Thompson sampling, and collaborative filtering methods.Item Open Access Finding it now: networked classifiers in real-time stream mining systems(Springer, Cham, 2019) Ducasse, R.; Tekin, Cem; van der Schaar; Bhattacharyya, S. S.; Deprettere, Ed. F.; Leupers, R.; Takala, J.The aim of this chapter is to describe and optimize the specifications of signal processing systems, aimed at extracting in real time valuable information out of large-scale decentralized datasets. A first section will explain the motivations and stakes and describe key characteristics and challenges of stream mining applications. We then formalize an analytical framework which will be used to describe and optimize distributed stream mining knowledge extraction from large scale streams. In stream mining applications, classifiers are organized into a connected topology mapped onto a distributed infrastructure. We will study linear chains and optimise the ordering of the classifiers to increase accuracy of classification and minimise delay. We then present a decentralized decision framework for joint topology construction and local classifier configuration. In many cases, accuracy of classifiers are not known beforehand. In the last section, we look at how to learn online the classifiers characteristics without increasing computation overhead. Stream mining is an active field of research, at the crossing of various disciplines, including multimedia signal processing, distributed systems, machine learning etc. As such, we will indicate several areas for future research and development.Item Open Access Functional contour-following via haptic perception and reinforcement learning(Institute of Electrical and Electronics Engineers, 2018) Hellman, R. B.; Tekin, Cem; Schaar, M. V.; Santos, V. J.Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: The closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.Item Open Access Gambler's ruin bandit problem(IEEE, 2017) Akbarzadeh, Nima; Tekin, CemIn this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a terminal state is reached, and the learner incurs a positive reward only when the goal state is reached. The objective of the learner is to maximize its long-term reward (expected number of times the goal state is reached), without having any prior knowledge on the state transition probabilities. We first prove a result on the form of the optimal policy for the GRBP. Then, we define the regret of the learner with respect to an omnipotent oracle, which acts optimally in each round, and prove that it increases logarithmically over rounds. We also identify a condition under which the learner's regret is bounded. A potential application of the GRBP is optimal medical treatment assignment, in which the continuation action corresponds to a conservative treatment and the terminal action corresponds to a risky treatment such as surgery.Item Open Access Generalized global bandit and its application in cellular coverage optimization(Institute of Electrical and Electronics Engineers, 2018) Shen, C.; Zhou, R.; Tekin, Cem; Schaar, M. V. D.Motivated by the engineering problem of cellular coverage optimization, we propose a novel multiarmed bandit model called generalized global bandit. We develop a series of greedy algorithms that have the capability to handle nonmonotonic but decomposable reward functions, multidimensional global parameters, and switching costs. The proposed algorithms are rigorously analyzed under the multiarmed bandit framework, where we show that they achieve bounded regret, and hence, they are guaranteed to converge to the optimal arm in finite time. The algorithms are then applied to the cellular coverage optimization problem to achieve the optimal tradeoff between sufficient small cell coverage and limited macroleakage without prior knowledge of the deployment environment. The performance advantage of the new algorithms over existing bandits solutions is revealed analytically and further confirmed via numerical simulations. The key element behind the performance improvement is a more efficient 'trial and error' mechanism, in which any trial will help improve the knowledge of all candidate power levels.Item Open Access Global bandits(Institute of Electrical and Electronics Engineers, 2018) Atan, O.; Tekin, Cem; Schaar, M. V. D.Multiarmed bandits (MABs) model sequential decision-making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior works on MAB assume that the reward distributions of each arm are independent. But in a wide variety of decision problems - from drug dosage to dynamic pricing - the expected rewards of different arms are correlated, so that selecting one arm provides information about the expected rewards of other arms as well. We propose and analyze a class of models of such decision problems, which we call global bandits (GB). In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves bounded regret, with a bound that depends on the single true parameter of the problem. Hence, this policy selects suboptimal arms only finitely many times with probability one. For this case, we also obtain a bound on regret that is independent of the true parameter; this bound is sublinear, with an exponent that depends on the informativeness of the arms. We also propose a variant of the greedy policy that achieves O(√T) worst case and O(1) parameter-dependent regret. Finally, we perform experiments on dynamic pricing and show that the proposed algorithms achieve significant gains with respect to the well-known benchmarks.