Ensemble pruning for text categorization based on data partitioning
Author
Toraman, Çağrı
Can, Fazlı
Date
2011Source Title
Information Retrieval Technology
Print ISSN
0302-9743
Publisher
Springer, Berlin, Heidelberg
Volume
7097
Pages
352 - 361
Language
English
Type
Conference PaperItem Usage Stats
138
views
views
106
downloads
downloads
Abstract
Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning. © 2011 Springer-Verlag Berlin Heidelberg.
Keywords
Data partitioningBase classifiers
Computation costs
Data partitioning
Data-partitioning method
Ensemble members
Ensemble methods
Ensemble pruning
Reuters-21578
Text categorization
Text collection
Data handling
Infrared devices
Text processing
Information retrieval
Permalink
http://hdl.handle.net/11693/28247Published Version (Please cite this version)
http://dx.doi.org/10.1007/978-3-642-25631-8_32https://doi.org/10.1007/978-3-642-25631-8