Application of map/reduce paradigm in supercomputing systems

buir.advisorAykanat, Cevdet
dc.contributor.authorDemirci, Gündüz Vehbi
dc.date.accessioned2016-01-08T18:26:34Z
dc.date.available2016-01-08T18:26:34Z
dc.date.issued2013
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references leaves 47-49.en_US
dc.description.abstractMap/Reduce is a framework first introduced by Google in order to rapidly develop big data analytic applications on distributed computing systems. Even though the Map/Reduce paradigm had a game changing impact on certain fields of computer science such as information retrieval and data mining, it did not have such an impact on the scientific computing domain yet. The current implementations of Map/Reduce are especially designed for commodity PC clusters, where failures of compute nodes are common and inter-processor communication is slow. However, scientific computing applications are usually executed on high performance computing (HPC) systems and such systems provide high communication bandwidth with low message latency where failures of processors are rare. Therefore, Map/Reduce framework causes performance degradation and becomes less preferable in scientific computing domain. Due to these reasons, specific implementations of Map/Reduce paradigm are needed for scientific computing domain. Among the existing implementations, we focus our attention on the MapReduce-MPI (MR-MPI) library developed at Sandia National Labs. In this thesis, we argue that by utilizing MR-MPI Library, the Map/Reduce programming paradigm can be successfully utilized for scientific computing applications that require scalability and performance. We tested MR-MPI Library in HPC systems with several fundamental algorithms that are frequently used in scientific computing and data mining domains. Implemented algorithms include all-pair-similarity-search (APSS), all-pair-shortest-path (APSP), and page-rank (PR). Tests were performed on well-known large-scale HPC systems IBM BlueGene/Q (Juqueen) and Cray XE6 (Hermit) to examine scalability and speedup of these algorithms.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:26:34Z (GMT). No. of bitstreams: 1 0006604.pdf: 541501 bytes, checksum: 4b842d25a09ce04cbb80385d342619f0 (MD5)en
dc.description.statementofresponsibilityDemirci, Gündüz Vehbien_US
dc.format.extentx, 49 leaves, charts, tablesen_US
dc.identifier.itemidB139371
dc.identifier.urihttp://hdl.handle.net/11693/15906
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectMap/Reduceen_US
dc.subjectBig Dataen_US
dc.subjectData miningen_US
dc.subjectInformation Retrievalen_US
dc.subjectDistributed Computing Systemsen_US
dc.subject.lccQA76.9.D343 D45 2013en_US
dc.subject.lcshBig data.en_US
dc.subject.lcshData mining.en_US
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshSupercomputers.en_US
dc.subject.lcshScience--Data processing.en_US
dc.titleApplication of map/reduce paradigm in supercomputing systemsen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006604.pdf
Size:
528.81 KB
Format:
Adobe Portable Document Format