Ensemble pruning for text categorization based on data partitioning

Date
2011
Advisor
Instructor
Source Title
Information Retrieval Technology
Print ISSN
0302-9743
Electronic ISSN
Publisher
Springer, Berlin, Heidelberg
Volume
7097
Issue
Pages
352 - 361
Language
English
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Abstract

Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning. © 2011 Springer-Verlag Berlin Heidelberg.

Course
Other identifiers
Book Title
Keywords
Data partitioning, Base classifiers, Computation costs, Data partitioning, Data-partitioning method, Ensemble members, Ensemble methods, Ensemble pruning, Reuters-21578, Text categorization, Text collection, Data handling, Infrared devices, Text processing, Information retrieval
Citation