Large-scale cluster-based retrieval experiments on Turkish texts
Date
2007Source Title
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Publisher
ACM
Pages
891 - 892
Language
English
Type
Conference PaperItem Usage Stats
161
views
views
113
downloads
downloads
Abstract
We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents. In particular, we compare CBR effectiveness with full-text search (FS) and evaluate several implementation alternatives for CBR. Our findings reveal that CBR yields comparable effectiveness figures with FS. Furthermore, by using a specifically tailored cluster-skipping inverted index we significantly improve in-memory query processing efficiency of CBR in comparison to other traditional CBR techniques and even FS.
Keywords
Cluster-based retrievalCluster-skipping
Inverted index
Turkish
Classification (of information)
Cluster analysis
Data processing
Query languages
Search engines
Cluster based retrieval
Full-text search (FS)
Inverted index
Information retrieval
Permalink
http://hdl.handle.net/11693/27076Published Version (Please cite this version)
http://dx.doi.org/10.1145/1277741.1277961Collections
Related items
Showing items related by title, author, creator and subject.
-
A new approach to search result clustering and labeling
Türel, Anıl; Can, Fazlı (Springer, Berlin, Heidelberg, 2011)Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In ... -
Efficiency and effectiveness of query processing in cluster-based retrieval
Can, F.; Altingövde I.S.; Demir, E. (Elsevier, 2004)Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search ... -
Exploiting index pruning methods for clustering XML collections
Altıngövde, İsmail Şengör; Atılgan, Duygu; Ulusoy, Özgür (Springer, Berlin, Heidelberg, 2010)In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document ...