Big-data streaming applications scheduling based on staged multi-armed bandits

Kanoun, K.; Tekin, C.; Atienza, D.; Van Der Schaar, M.

Big-data streaming applications scheduling based on staged multi-armed bandits

Files

Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits.pdf (2.34 MB)

Date

2016

Authors

Kanoun, K.

Tekin, C.

Atienza, D.

Van Der Schaar, M.

Source Title

IEEE Transactions on Computers

Print ISSN

0018-9340

Publisher

Institute of Electrical and Electronics Engineers

Volume

65

Issue

12

Pages

3591 - 3605

Language

English

Type

Article

Abstract

Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively.