Exploiting index pruning methods for clustering XML collections
Altıngövde, İsmail Şengör
Focused Retrieval and Evaluation
Springer, Berlin, Heidelberg
379 - 386
Item Usage Stats
MetadataShow full item record
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.
KeywordsCover-coefficient based clustering
Cover-coefficient based clustering
Published Version (Please cite this version)http://dx.doi.org/10.1007/978-3-642-14556-8_37
Showing items related by title, author, creator and subject.
Gürsoy, Attila; Cengiz, Ilker (Springer Verlag, 2001)We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that ...
Altıngövde İsmail Şengör; Demir, Engin; Can, Fazlı; Ulusoy, Özgür (ACM, 2008-07)Web search engines typically index and retrieve at the page level. In this study, we investigate a dynamic pruning strategy that allows the query processor to first determine the most promising websites and then proceed ...
Improving the efficiency of search engines : strategies for focused crawling, searching, and index pruning Altıngövde, İsmail Sengör (Bilkent University, 2009)Search engines are the primary means of retrieval for text data that is abundantly available on the Web. A standard search engine should carry out three fundamental tasks, namely; crawling the Web, indexing the crawled ...