Locality-aware and load-balanced static task scheduling for MapReduce

Selvitopu, Oğuz; Demirci, Gündüz Vehbi; Türk, Ata; Aykanat, Cevdet

Locality-aware and load-balanced static task scheduling for MapReduce

Available

The embargo period has ended, and this item is now available.

Files

Locality-aware_and_load-balanced_static_task_scheduling_for_MapReduce.pdf (937.58 KB)

Date

2018

Authors

Selvitopu, Oğuz

Demirci, Gündüz Vehbi

Türk, Ata

Aykanat, Cevdet

BUIR Usage Stats

0
views

35
downloads

Citation Stats

Abstract

Task scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality and balance the processors’ loads in both map and reduce phases. Our approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Partitions produced by these models are decoded to schedule map and reduce tasks. A two-constraint formulation utilized in these models enables balancing processors’ loads in both map and reduce phases. The partitioning objective in the hypergraph models correctly encapsulates the minimization of data transfer when a local combine step is performed prior to shuffle, whereas the partitioning objective in the graph models achieve the same feat when a local combine is not performed. We show the validity of our scheduling on the MapReduce parallelizations of two important kernel operations – sparse matrix–vector multiplication (SpMV) and generalized sparse matrix–matrix multiplication (SpGEMM) – that are widely encountered in big data analytics and scientific computations. Compared to random scheduling, our models lead to tremendous savings in data transfer by reducing data traffic from several hundreds of megabytes to just a few megabytes in the shuffle phase and consequently leading up to 2.6x and 4.2x speedup for SpMV and SpGEMM, respectively.

Source Title

Future Generation Computer Systems

Publisher

Elsevier

Keywords

Map, Reduce, Scheduling, Data locality, Load balance, Map task, Reduce task

Permalink

http://hdl.handle.net/11693/53019

Published Version (Please cite this version)

https://dx.doi.org/10.1016/j.future.2018.06.035

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Locality-aware and load-balanced static task scheduling for MapReduce

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Locality-aware and load-balanced static task scheduling for MapReduce

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type