Hypergraph-theoretic partitioning models for parallel web crawling
buir.contributor.author | Aykanat, Cevdet | |
dc.citation.epage | 25 | en_US |
dc.citation.spage | 19 | en_US |
dc.contributor.author | Türk, Ata | en_US |
dc.contributor.author | Cambazoğlu, B. Barla | en_US |
dc.contributor.author | Aykanat, Cevdet | en_US |
dc.date.accessioned | 2016-02-08T12:10:43Z | |
dc.date.available | 2016-02-08T12:10:43Z | |
dc.date.issued | 2012 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Date of Conference: 26th International Symposium on Computer and Information Sciences | en_US |
dc.description.abstract | Parallel web crawling is an important technique employed by large-scale search engines for content acquisition. A commonly used inter-processor coordination scheme in parallel crawling systems is the link exchange scheme, where discovered links are communicated between processors. This scheme can attain the coverage and quality level of a serial crawler while avoiding redundant crawling of pages by different processors. The main problem in the exchange scheme is the high inter-processor communication overhead. In this work, we propose a hypergraph model that reduces the communication overhead associated with link exchange operations in parallel web crawling systems by intelligent assignment of sites to processors. Our hypergraph model can correctly capture and minimize the number of network messages exchanged between crawlers. We evaluate the performance of our models on four benchmark datasets. Compared to the traditional hash-based assignment approach, significant performance improvements are observed in reducing the inter-processor communication overhead. © 2012 Springer-Verlag London Limited. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T12:10:43Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2012 | en |
dc.identifier.doi | 10.1007/978-1-4471-2155-8_2 | en_US |
dc.identifier.doi | 10.1007/978-1-4471-2155-8 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/28087 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Springer, London | en_US |
dc.relation.isversionof | https://doi.org/10.1007/978-1-4471-2155-8_2 | en_US |
dc.relation.isversionof | https://doi.org/10.1007/978-1-4471-2155-8 | en_US |
dc.source.title | Computer and Information Sciences II | en_US |
dc.subject | Benchmark datasets | en_US |
dc.subject | Communication overheads | en_US |
dc.subject | Content acquisition | en_US |
dc.subject | Coordination scheme | en_US |
dc.subject | Hypergraph model | en_US |
dc.subject | Inter processor communication | en_US |
dc.subject | Interprocessors | en_US |
dc.subject | Network messages | en_US |
dc.subject | Benchmarking | en_US |
dc.subject | Communication | en_US |
dc.subject | Cost reduction | en_US |
dc.subject | Data processing | en_US |
dc.subject | Information science | en_US |
dc.subject | Search engines | en_US |
dc.subject | Parallel processing systems | en_US |
dc.title | Hypergraph-theoretic partitioning models for parallel web crawling | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Hypergraph-theoretic partitioning models for parallel web crawling.pdf
- Size:
- 228.69 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version