Exploiting index pruning methods for clustering XML collections
Author
Altıngövde, İsmail Şengör
Atılgan, Duygu
Ulusoy, Özgür
Date
2010Source Title
Focused Retrieval and Evaluation
Print ISSN
0302-9743
Publisher
Springer, Berlin, Heidelberg
Volume
6203
Pages
379 - 386
Language
English
Type
Conference PaperItem Usage Stats
146
views
views
104
downloads
downloads
Abstract
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.
Keywords
Cover-coefficient based clusteringIndex pruning
Clustering index
Cover-coefficient based clustering
Document vectors
Evaluation metrics
Pruning methods
Pruning techniques
Based clustering
Document vectors
Evaluation metrics
Pruning methods
Pruning techniques
Query languages
XML
Markup languages
Quality control
Permalink
http://hdl.handle.net/11693/28561Published Version (Please cite this version)
http://dx.doi.org/10.1007/978-3-642-14556-8_37https://doi.org/10.1007/978-3-642-14556-8
Collections
Related items
Showing items related by title, author, creator and subject.
-
Parallel pruning for k-means clustering on shared memory architectures
Gürsoy, Attila; Cengiz, Ilker (Springer Verlag, 2001)We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that ... -
Site-based dynamic pruning for query processing in search engines
Altıngövde İsmail Şengör; Demir, Engin; Can, Fazlı; Ulusoy, Özgür (ACM, 2008-07)Web search engines typically index and retrieve at the page level. In this study, we investigate a dynamic pruning strategy that allows the query processor to first determine the most promising websites and then proceed ... -
Improving the efficiency of search engines : strategies for focused crawling, searching, and index pruning
Altıngövde, İsmail Sengör (Bilkent University, 2009)Search engines are the primary means of retrieval for text data that is abundantly available on the Web. A standard search engine should carry out three fundamental tasks, namely; crawling the Web, indexing the crawled ...