Document replication strategies for geographically distributed web search engines

buir.contributor.authorAykanat, Cevdet
dc.citation.epage66en_US
dc.citation.issueNumber1en_US
dc.citation.spage51en_US
dc.citation.volumeNumber49en_US
dc.contributor.authorKayaaslan, E.en_US
dc.contributor.authorCambazoglu, B. B.en_US
dc.contributor.authorAykanat, Cevdeten_US
dc.date.accessioned2015-07-28T12:01:04Z
dc.date.available2015-07-28T12:01:04Z
dc.date.issued2013en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractLarge-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved.en_US
dc.identifier.doi10.1016/j.ipm.2012.01.002en_US
dc.identifier.issn0306-4573
dc.identifier.urihttp://hdl.handle.net/11693/12343
dc.language.isoEnglishen_US
dc.publisherElsevier Ltd.en_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.ipm.2012.01.002en_US
dc.source.titleInformation Processing & Managementen_US
dc.subjectWeb searchen_US
dc.subjectDistributed information retrievalen_US
dc.subjectDocument replicationen_US
dc.subjectQuery processingen_US
dc.subjectQuery forwardingen_US
dc.subjectResult cachingen_US
dc.subjectObject replicationen_US
dc.subjectRetrieval systemsen_US
dc.subjectDatabase - systemsen_US
dc.subjectData allocationen_US
dc.subjectAlgorithmsen_US
dc.subjectPlacementen_US
dc.subjectPerformanceen_US
dc.subjectNetworksen_US
dc.subjectServersen_US
dc.titleDocument replication strategies for geographically distributed web search enginesen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10.1016-j.ipm.2012.01.002.pdf
Size:
764.91 KB
Format:
Adobe Portable Document Format