Big-data streaming applications scheduling based on staged multi-armed bandits

Date
2016
Authors
Kanoun, K.
Tekin, C.
Atienza, D.
Van Der Schaar, M.
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
IEEE Transactions on Computers
Print ISSN
0018-9340
Electronic ISSN
Publisher
Institute of Electrical and Electronics Engineers
Volume
65
Issue
12
Pages
3591 - 3605
Language
English
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively.

Course
Other identifiers
Book Title
Citation
Published Version (Please cite this version)