Exploiting index pruning methods for clustering XML collections
Date
2010
Advisor
Instructor
Source Title
Focused Retrieval and Evaluation
Print ISSN
0302-9743
Electronic ISSN
Publisher
Springer, Berlin, Heidelberg
Volume
6203
Issue
Pages
379 - 386
Language
English
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Abstract
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.
Course
Other identifiers
Book Title
Keywords
Cover-coefficient based clustering, Index pruning, Clustering index, Cover-coefficient based clustering, Document vectors, Evaluation metrics, Pruning methods, Pruning techniques, Based clustering, Document vectors, Evaluation metrics, Pruning methods, Pruning techniques, Query languages, XML, Markup languages, Quality control