Document replication strategies for geographically distributed web search engines
buir.contributor.author | Aykanat, Cevdet | |
dc.citation.epage | 66 | en_US |
dc.citation.issueNumber | 1 | en_US |
dc.citation.spage | 51 | en_US |
dc.citation.volumeNumber | 49 | en_US |
dc.contributor.author | Kayaaslan, E. | en_US |
dc.contributor.author | Cambazoglu, B. B. | en_US |
dc.contributor.author | Aykanat, Cevdet | en_US |
dc.date.accessioned | 2015-07-28T12:01:04Z | |
dc.date.available | 2015-07-28T12:01:04Z | |
dc.date.issued | 2013 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved. | en_US |
dc.description.provenance | Made available in DSpace on 2015-07-28T12:01:04Z (GMT). No. of bitstreams: 1 10.1016-j.ipm.2012.01.002.pdf: 783263 bytes, checksum: d8c106aebed236a700d80c9e2da9b59c (MD5) | en |
dc.identifier.doi | 10.1016/j.ipm.2012.01.002 | en_US |
dc.identifier.issn | 0306-4573 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/12343 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Elsevier Ltd. | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1016/j.ipm.2012.01.002 | en_US |
dc.source.title | Information Processing & Management | en_US |
dc.subject | Web search | en_US |
dc.subject | Distributed information retrieval | en_US |
dc.subject | Document replication | en_US |
dc.subject | Query processing | en_US |
dc.subject | Query forwarding | en_US |
dc.subject | Result caching | en_US |
dc.subject | Object replication | en_US |
dc.subject | Retrieval systems | en_US |
dc.subject | Database - systems | en_US |
dc.subject | Data allocation | en_US |
dc.subject | Algorithms | en_US |
dc.subject | Placement | en_US |
dc.subject | Performance | en_US |
dc.subject | Networks | en_US |
dc.subject | Servers | en_US |
dc.title | Document replication strategies for geographically distributed web search engines | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- 10.1016-j.ipm.2012.01.002.pdf
- Size:
- 764.91 KB
- Format:
- Adobe Portable Document Format