Locality-aware and load-balanced static task scheduling for MapReduce

buir.contributor.authorSelvitopu, Oğuz
buir.contributor.authorDemirci, Gündüz Vehbi
buir.contributor.authorAykanat, Cevdet
dc.citation.epage61en_US
dc.citation.spage49en_US
dc.citation.volumeNumber90en_US
dc.contributor.authorSelvitopu, Oğuzen_US
dc.contributor.authorDemirci, Gündüz Vehbien_US
dc.contributor.authorTürk, Ataen_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2020-02-04T05:48:58Z
dc.date.available2020-02-04T05:48:58Z
dc.date.issued2018
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractTask scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality and balance the processors’ loads in both map and reduce phases. Our approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Partitions produced by these models are decoded to schedule map and reduce tasks. A two-constraint formulation utilized in these models enables balancing processors’ loads in both map and reduce phases. The partitioning objective in the hypergraph models correctly encapsulates the minimization of data transfer when a local combine step is performed prior to shuffle, whereas the partitioning objective in the graph models achieve the same feat when a local combine is not performed. We show the validity of our scheduling on the MapReduce parallelizations of two important kernel operations – sparse matrix–vector multiplication (SpMV) and generalized sparse matrix–matrix multiplication (SpGEMM) – that are widely encountered in big data analytics and scientific computations. Compared to random scheduling, our models lead to tremendous savings in data transfer by reducing data traffic from several hundreds of megabytes to just a few megabytes in the shuffle phase and consequently leading up to 2.6x and 4.2x speedup for SpMV and SpGEMM, respectively.en_US
dc.description.provenanceSubmitted by Onur Emek (onur.emek@bilkent.edu.tr) on 2020-02-04T05:48:58Z No. of bitstreams: 1 Bilkent-research-paper.pdf: 268963 bytes, checksum: ad2e3a30c8172b573b9662390ed2d3cf (MD5)en
dc.description.provenanceMade available in DSpace on 2020-02-04T05:48:58Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 268963 bytes, checksum: ad2e3a30c8172b573b9662390ed2d3cf (MD5) Previous issue date: 2018en
dc.description.sponsorshipResearch Council of Turkey (TUBITAK)en_US
dc.embargo.release2021-01-01
dc.identifier.doi10.1016/j.future.2018.06.035en_US
dc.identifier.issn0167-739Xen_US
dc.identifier.urihttp://hdl.handle.net/11693/53019en_US
dc.language.isoEnglishen_US
dc.publisherElsevieren_US
dc.relation.isversionofhttps://dx.doi.org/10.1016/j.future.2018.06.035en_US
dc.source.titleFuture Generation Computer Systemsen_US
dc.subjectMapen_US
dc.subjectReduceen_US
dc.subjectSchedulingen_US
dc.subjectData localityen_US
dc.subjectLoad balanceen_US
dc.subjectMap tasken_US
dc.subjectReduce tasken_US
dc.titleLocality-aware and load-balanced static task scheduling for MapReduceen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Locality-aware_and_load-balanced_static_task_scheduling_for_MapReduce.pdf
Size:
937.58 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: