Ensemble pruning for text categorization based on data partitioning
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
352 - 361
Item Usage Stats
MetadataShow full item record
Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning. © 2011 Springer-Verlag Berlin Heidelberg.
Permalink (Please cite this version)http://hdl.handle.net/11693/28247
Showing items related by title, author, creator and subject.
Toraman, C.; Can F. (2012)Recent studies show that ensemble pruning works as effective as traditional ensemble of classifiers (EoC). In this study, we analyze how ensemble pruning can improve text categorization efficiency in time-critical real-life ...
Qureshi, M. A.; Eksioglu, K. (Institute of Electrical and Electronics Engineers Inc., 2017)Thyroid gland influences the metabolic processes of human body due to the fact that it produces hormones. Hyperthyroidism in caused due to increase in the production of thyroid hormones. In this paper a methodology using ...
Bonab, H. R.; Can, F. (Association for Computing Machinery, 2016)A priori determining the ideal number of component classifiers of an ensemble is an important problem. The volume and velocity of big data streams make this even more crucial in terms of prediction accuracies and resource ...