Site-based partitioning and repartitioning techniques for parallel pagerank computation

buir.contributor.authorAykanat, Cevdet
dc.citation.epage802en_US
dc.citation.issueNumber5en_US
dc.citation.spage786en_US
dc.citation.volumeNumber22en_US
dc.contributor.authorCevahir, A.en_US
dc.contributor.authorAykanat, Cevdeten_US
dc.contributor.authorTurk, A.en_US
dc.contributor.authorCambazoglu, B. B.en_US
dc.date.accessioned2016-02-08T09:54:24Z
dc.date.available2016-02-08T09:54:24Z
dc.date.issued2011-05en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractThe PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm. © 2011 IEEE.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T09:54:24Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011en
dc.identifier.doi10.1109/TPDS.2010.119en_US
dc.identifier.eissn1558-2183
dc.identifier.issn1045-9219
dc.identifier.urihttp://hdl.handle.net/11693/22022
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TPDS.20http://dx.doi.org/10.119en_US
dc.source.titleIEEE Transactions on Parallel and Distributed Systemsen_US
dc.subjectGraph partitioningen_US
dc.subjectHypergraph partitioningen_US
dc.subjectPageRanken_US
dc.subjectParallelizationen_US
dc.subjectRepartitioningen_US
dc.subjectSparse matrix partitioningen_US
dc.subjectSparse matrix - vector multiplicationen_US
dc.subjectWeb searchen_US
dc.titleSite-based partitioning and repartitioning techniques for parallel pagerank computationen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Site-based partitioning and repartitioning techniques for parallel pagerank computation.pdf
Size:
5.53 MB
Format:
Adobe Portable Document Format
Description:
Full printable version