Performance comparison of query evaluation techniques in parallel text retrieval systems
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/14774
Today’s state-of-the-art search engines utilize the inverted index data structure for fast text retrieval on large document collections. To parallelize the retrieval process, the inverted index should be distributed among multiple index servers. Generally the distribution of the inverted index is done in either a term-based or a document-based fashion. The performances of both schemes depend on the total number of disk accesses and the total volume of communication in the system. The classical approach for both distributions is to use the Central Broker Query Evaluation Scheme (CB) for parallel text retrieval. It is known that in this approach the central broker is heavily loaded and becomes a bottleneck. Recently, an alternative query evaluation technique, named Pipelined Query Evaluation Scheme (PPL), has been proposed to alleviate this problem by performing the merge operation on the index servers. In this study, we analyze the scalability and relative performances of the CB and PPL under various query loads to report the benefits and drawbacks of each method.