• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Hypergraph-theoretic partitioning models for parallel web crawling

      Thumbnail
      View / Download
      228.7 Kb
      Author
      Türk, Ata
      Cambazoğlu, B. Barla
      Aykanat, Cevdet
      Date
      2012
      Source Title
      Computer and Information Sciences II
      Publisher
      Springer, London
      Pages
      19 - 25
      Language
      English
      Type
      Conference Paper
      Book Chapter
      Item Usage Stats
      159
      views
      107
      downloads
      Abstract
      Parallel web crawling is an important technique employed by large-scale search engines for content acquisition. A commonly used inter-processor coordination scheme in parallel crawling systems is the link exchange scheme, where discovered links are communicated between processors. This scheme can attain the coverage and quality level of a serial crawler while avoiding redundant crawling of pages by different processors. The main problem in the exchange scheme is the high inter-processor communication overhead. In this work, we propose a hypergraph model that reduces the communication overhead associated with link exchange operations in parallel web crawling systems by intelligent assignment of sites to processors. Our hypergraph model can correctly capture and minimize the number of network messages exchanged between crawlers. We evaluate the performance of our models on four benchmark datasets. Compared to the traditional hash-based assignment approach, significant performance improvements are observed in reducing the inter-processor communication overhead. © 2012 Springer-Verlag London Limited.
      Keywords
      Benchmark datasets
      Communication overheads
      Content acquisition
      Coordination scheme
      Hypergraph model
      Inter processor communication
      Interprocessors
      Network messages
      Benchmarking
      Communication
      Cost reduction
      Data processing
      Information science
      Search engines
      Parallel processing systems
      Permalink
      http://hdl.handle.net/11693/28087
      Published Version (Please cite this version)
      https://doi.org/10.1007/978-1-4471-2155-8_2
      https://doi.org/10.1007/978-1-4471-2155-8
      Collections
      • Department of Computer Engineering 1368
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy