Efficiency and effectiveness of query processing in cluster-based retrieval
697 - 717
Item Usage Stats
MetadataShow full item record
Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210158 documents of size 564 MB defined by 229748 terms with total of 29545234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering. © 2003 Elsevier Ltd. All rights reserved.
Indexing (of information)
Published Version (Please cite this version)http://dx.doi.org/10.1016/S0306-4379(03)00062-0
Showing items related by title, author, creator and subject.
Türel, Anıl; Can, Fazlı (Springer, Berlin, Heidelberg, 2011)Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In ...
Altıngövde, İsmail Şengör; Özcan, Rıfat; Öcalan Hüseyin C.; Can, Fazlı; Ulusoy, Özgür (ACM, 2007)We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering ...
Altingovde, I. S.; Demir, E.; Can, F.; Ulusoy, Ö. (Association for Computing Machinery, 2008-06)We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information ...