Document replication strategies for geographically distributed web search engines

Kayaaslan, E.; Cambazoglu, B. B.; Aykanat, Cevdet

Document replication strategies for geographically distributed web search engines

buir.contributor.author	Aykanat, Cevdet
dc.citation.epage	66	en_US
dc.citation.issueNumber	1	en_US
dc.citation.spage	51	en_US
dc.citation.volumeNumber	49	en_US
dc.contributor.author	Kayaaslan, E.	en_US
dc.contributor.author	Cambazoglu, B. B.	en_US
dc.contributor.author	Aykanat, Cevdet	en_US
dc.date.accessioned	2015-07-28T12:01:04Z
dc.date.available	2015-07-28T12:01:04Z
dc.date.issued	2013	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved.	en_US
dc.identifier.doi	10.1016/j.ipm.2012.01.002	en_US
dc.identifier.issn	0306-4573	en_US
dc.identifier.uri	http://hdl.handle.net/11693/12343	en_US
dc.language.iso	English	en_US
dc.publisher	Elsevier Ltd.	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.ipm.2012.01.002	en_US
dc.source.title	Information Processing & Management	en_US
dc.subject	Web search	en_US
dc.subject	Distributed information retrieval	en_US
dc.subject	Document replication	en_US
dc.subject	Query processing	en_US
dc.subject	Query forwarding	en_US
dc.subject	Result caching	en_US
dc.subject	Object replication	en_US
dc.subject	Retrieval systems	en_US
dc.subject	Database - systems	en_US
dc.subject	Data allocation	en_US
dc.subject	Algorithms	en_US
dc.subject	Placement	en_US
dc.subject	Performance	en_US
dc.subject	Networks	en_US
dc.subject	Servers	en_US
dc.title	Document replication strategies for geographically distributed web search engines	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10.1016-j.ipm.2012.01.002.pdf
Size:: 764.91 KB
Format:: Adobe Portable Document Format

Download

Collections

Scholarly Publications - Computer Engineering