Locality-aware and load-balanced static task scheduling for MapReduce

Selvitopu, Oğuz; Demirci, Gündüz Vehbi; Türk, Ata; Aykanat, Cevdet

Locality-aware and load-balanced static task scheduling for MapReduce

buir.contributor.author	Selvitopu, Oğuz
buir.contributor.author	Demirci, Gündüz Vehbi
buir.contributor.author	Aykanat, Cevdet
dc.citation.epage	61	en_US
dc.citation.spage	49	en_US
dc.citation.volumeNumber	90	en_US
dc.contributor.author	Selvitopu, Oğuz	en_US
dc.contributor.author	Demirci, Gündüz Vehbi	en_US
dc.contributor.author	Türk, Ata	en_US
dc.contributor.author	Aykanat, Cevdet	en_US
dc.date.accessioned	2020-02-04T05:48:58Z
dc.date.available	2020-02-04T05:48:58Z
dc.date.issued	2018
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Task scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality and balance the processors’ loads in both map and reduce phases. Our approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Partitions produced by these models are decoded to schedule map and reduce tasks. A two-constraint formulation utilized in these models enables balancing processors’ loads in both map and reduce phases. The partitioning objective in the hypergraph models correctly encapsulates the minimization of data transfer when a local combine step is performed prior to shuffle, whereas the partitioning objective in the graph models achieve the same feat when a local combine is not performed. We show the validity of our scheduling on the MapReduce parallelizations of two important kernel operations – sparse matrix–vector multiplication (SpMV) and generalized sparse matrix–matrix multiplication (SpGEMM) – that are widely encountered in big data analytics and scientific computations. Compared to random scheduling, our models lead to tremendous savings in data transfer by reducing data traffic from several hundreds of megabytes to just a few megabytes in the shuffle phase and consequently leading up to 2.6x and 4.2x speedup for SpMV and SpGEMM, respectively.	en_US
dc.description.sponsorship	Research Council of Turkey (TUBITAK)	en_US
dc.embargo.release	2021-01-01
dc.identifier.doi	10.1016/j.future.2018.06.035	en_US
dc.identifier.issn	0167-739X	en_US
dc.identifier.uri	http://hdl.handle.net/11693/53019	en_US
dc.language.iso	English	en_US
dc.publisher	Elsevier	en_US
dc.relation.isversionof	https://dx.doi.org/10.1016/j.future.2018.06.035	en_US
dc.source.title	Future Generation Computer Systems	en_US
dc.subject	Map	en_US
dc.subject	Reduce	en_US
dc.subject	Scheduling	en_US
dc.subject	Data locality	en_US
dc.subject	Load balance	en_US
dc.subject	Map task	en_US
dc.subject	Reduce task	en_US
dc.title	Locality-aware and load-balanced static task scheduling for MapReduce	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Locality-aware_and_load-balanced_static_task_scheduling_for_MapReduce.pdf
Size:: 937.58 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering