Ensemble pruning for text categorization based on data partitioning

Date

2011

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Information Retrieval Technology

Print ISSN

0302-9743

Electronic ISSN

Publisher

Springer, Berlin, Heidelberg

Volume

7097

Issue

Pages

352 - 361

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning. © 2011 Springer-Verlag Berlin Heidelberg.

Course

Other identifiers

Book Title

Citation