Performance comparison of feature selection and extraction methods with random instance selection

Available
The embargo period has ended, and this item is now available.

Date

2021-10-01

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
2
views
40
downloads

Citation Stats

Series

Abstract

In pattern recognition, irrelevant and redundant features together with a large number of noisy instances in the underlying dataset decrease performance of trained models and make the training process considerably slower, if not practically infeasible. In order to combat this so-called curse of dimensionality, one option is to resort to feature selection (FS) methods designed to select the features that contribute the most to the performance of the model, and one other option is to utilize feature extraction (FE) methods that map the original feature space into a new space with lower dimensionality. These two methods together are called feature reduction (FR) methods. On the other hand, deploying an FR method on a dataset with massive number of instances can become a major challenge, from both memory and run time perspectives, due to the complex numerical computations involved in the process. The research question we consider in this study is rather a simple, yet novel one: do these FR methods really need the whole set of instances (WSI) available for the best performance, or can we achieve similar performance levels with selecting a much smaller random subset of WSI prior to deploying an FR method? In this work, we provide empirical evidence based on comprehensive computational experiments that the answer to this critical research question is in the affirmative. Specifically, with simple random instance selection followed by FR, the amount of data needed for training a classifier can be drastically reduced with minimal impact on classification performance. We also provide recommendations on which FS/ FE method to use in conjunction with which classifier.

Source Title

Expert Systems with Applications

Publisher

Elsevier Ltd

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

English