Exploiting index pruning methods for clustering XML collections

Date

2010

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Focused Retrieval and Evaluation

Print ISSN

0302-9743

Electronic ISSN

Publisher

Springer, Berlin, Heidelberg

Volume

6203

Issue

Pages

379 - 386

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.

Course

Other identifiers

Book Title

Citation