Browsing by Subject "Document collection"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems(Springer, 2006-11) Cambazoğlu, B. Barla; Çatal, A.; Aykanat, CevdetShared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate the effect of these two index partitioning schemes on query processing. We conduct experiments on a 32-node PC cluster, considering the case where index is completely stored in disk. Performance results are reported for a large (30 GB) document collection using an MPI-based parallel query processing implementation. © Springer-Verlag Berlin Heidelberg 2006.Item Open Access Query forwarding in geographically distributed search engines(ACM, 2010) Cambazoglu, B.B.; Varol, Emre; Kayaaslan, Enver; Aykanat, Cevdet; Baeza-Yates, R.Query forwarding is an important technique for preserving the result quality in distributed search engines where the index is geographically partitioned over multiple search sites. The key component in query forwarding is the thresholding algorithm by which the forwarding decisions are given. In this paper, we propose a linear-programming-based thresholding algorithm that significantly outperforms the current state-of-the-art in terms of achieved search efficiency values. Moreover, we evaluate a greedy heuristic for partial index replication and investigate the impact of result cache freshness on query forwarding performance. Finally, we present some optimizations that improve the performance further, under certain conditions. We evaluate the proposed techniques by simulations over a real-life setting, using a large query log and a document collection obtained from Yahoo!. © 2010 ACM.Item Open Access Timestamp-based result cache invalidation for web search engines(ACM, 2011) Alıcı, Sadiye; Altingovde I.S.; Özcan, Rıfat; Cambazoglu, B.B.; Ulusoy, ÖzgürThe result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluated using a Wikipedia document collection with real update information and a real-life query log. We show that our technique has good prediction accuracy, relative to a baseline based on the time-to-live mechanism. Moreover, it is easy to implement and incurs less processing overhead on the system relative to a recently proposed, more sophisticated invalidation mechanism.