Hypergraph-theoretic partitioning models for parallel web crawling

buir.contributor.authorAykanat, Cevdet
dc.citation.epage25en_US
dc.citation.spage19en_US
dc.contributor.authorTürk, Ataen_US
dc.contributor.authorCambazoğlu, B. Barlaen_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2016-02-08T12:10:43Z
dc.date.available2016-02-08T12:10:43Z
dc.date.issued2012en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 26th International Symposium on Computer and Information Sciencesen_US
dc.description.abstractParallel web crawling is an important technique employed by large-scale search engines for content acquisition. A commonly used inter-processor coordination scheme in parallel crawling systems is the link exchange scheme, where discovered links are communicated between processors. This scheme can attain the coverage and quality level of a serial crawler while avoiding redundant crawling of pages by different processors. The main problem in the exchange scheme is the high inter-processor communication overhead. In this work, we propose a hypergraph model that reduces the communication overhead associated with link exchange operations in parallel web crawling systems by intelligent assignment of sites to processors. Our hypergraph model can correctly capture and minimize the number of network messages exchanged between crawlers. We evaluate the performance of our models on four benchmark datasets. Compared to the traditional hash-based assignment approach, significant performance improvements are observed in reducing the inter-processor communication overhead. © 2012 Springer-Verlag London Limited.en_US
dc.identifier.doi10.1007/978-1-4471-2155-8_2en_US
dc.identifier.doi10.1007/978-1-4471-2155-8en_US
dc.identifier.urihttp://hdl.handle.net/11693/28087
dc.language.isoEnglishen_US
dc.publisherSpringer, Londonen_US
dc.relation.isversionofhttps://doi.org/10.1007/978-1-4471-2155-8_2en_US
dc.relation.isversionofhttps://doi.org/10.1007/978-1-4471-2155-8en_US
dc.source.titleComputer and Information Sciences IIen_US
dc.subjectBenchmark datasetsen_US
dc.subjectCommunication overheadsen_US
dc.subjectContent acquisitionen_US
dc.subjectCoordination schemeen_US
dc.subjectHypergraph modelen_US
dc.subjectInter processor communicationen_US
dc.subjectInterprocessorsen_US
dc.subjectNetwork messagesen_US
dc.subjectBenchmarkingen_US
dc.subjectCommunicationen_US
dc.subjectCost reductionen_US
dc.subjectData processingen_US
dc.subjectInformation scienceen_US
dc.subjectSearch enginesen_US
dc.subjectParallel processing systemsen_US
dc.titleHypergraph-theoretic partitioning models for parallel web crawlingen_US
dc.typeConference Paperen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hypergraph-theoretic partitioning models for parallel web crawling.pdf
Size:
228.69 KB
Format:
Adobe Portable Document Format
Description:
Full printable version