Ensemble pruning for text categorization based on data partitioning

Toraman, Çağrı; Can, Fazlı

Ensemble pruning for text categorization based on data partitioning

Files

Ensemble pruning for text categorization based on data partitioning.pdf (216.91 KB)

Date

2011

Authors

Toraman, Çağrı

Can, Fazlı

BUIR Usage Stats

3
views

19
downloads

Citation Stats

Abstract

Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning. © 2011 Springer-Verlag Berlin Heidelberg.

Source Title

Information Retrieval Technology

Publisher

Springer, Berlin, Heidelberg

Keywords

Data partitioning, Base classifiers, Computation costs, Data partitioning, Data-partitioning method, Ensemble members, Ensemble methods, Ensemble pruning, Reuters-21578, Text categorization, Text collection, Data handling, Infrared devices, Text processing, Information retrieval

Permalink

http://hdl.handle.net/11693/28247

Published Version (Please cite this version)

http://dx.doi.org/10.1007/978-3-642-25631-8_32
https://doi.org/10.1007/978-3-642-25631-8

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Conference Paper

Full item page

Ensemble pruning for text categorization based on data partitioning

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Ensemble pruning for text categorization based on data partitioning

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type