Performance comparison of feature selection and extraction methods with random instance selection

Malekipirbazari, Milad; Aksakallı, V.; Shafqat, W.; Eberhard, A.

Performance comparison of feature selection and extraction methods with random instance selection

buir.contributor.author	Malekipirbazari, Milad
buir.contributor.orcid	Malekipirbazari, Milad\|0000-0002-3212-6498
dc.citation.epage	115072-18	en_US
dc.citation.spage	115072-1	en_US
dc.citation.volumeNumber	179	en_US
dc.contributor.author	Malekipirbazari, Milad
dc.contributor.author	Aksakallı, V.
dc.contributor.author	Shafqat, W.
dc.contributor.author	Eberhard, A.
dc.date.accessioned	2022-02-10T12:41:27Z
dc.date.available	2022-02-10T12:41:27Z
dc.date.issued	2021-10-01
dc.department	Department of Industrial Engineering	en_US
dc.description.abstract	In pattern recognition, irrelevant and redundant features together with a large number of noisy instances in the underlying dataset decrease performance of trained models and make the training process considerably slower, if not practically infeasible. In order to combat this so-called curse of dimensionality, one option is to resort to feature selection (FS) methods designed to select the features that contribute the most to the performance of the model, and one other option is to utilize feature extraction (FE) methods that map the original feature space into a new space with lower dimensionality. These two methods together are called feature reduction (FR) methods. On the other hand, deploying an FR method on a dataset with massive number of instances can become a major challenge, from both memory and run time perspectives, due to the complex numerical computations involved in the process. The research question we consider in this study is rather a simple, yet novel one: do these FR methods really need the whole set of instances (WSI) available for the best performance, or can we achieve similar performance levels with selecting a much smaller random subset of WSI prior to deploying an FR method? In this work, we provide empirical evidence based on comprehensive computational experiments that the answer to this critical research question is in the affirmative. Specifically, with simple random instance selection followed by FR, the amount of data needed for training a classifier can be drastically reduced with minimal impact on classification performance. We also provide recommendations on which FS/ FE method to use in conjunction with which classifier.	en_US
dc.embargo.release	2023-10-01
dc.identifier.doi	10.1016/j.eswa.2021.115072	en_US
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/11693/77239
dc.language.iso	English	en_US
dc.publisher	Elsevier Ltd	en_US
dc.relation.isversionof	https://doi.org/10.1016/j.eswa.2021.115072	en_US
dc.source.title	Expert Systems with Applications	en_US
dc.subject	Explainable artificial intelligence	en_US
dc.subject	Dimension reduction	en_US
dc.subject	Feature selection	en_US
dc.subject	Feature extraction	en_US
dc.subject	Instance selection	en_US
dc.subject	Data preprocessing	en_US
dc.title	Performance comparison of feature selection and extraction methods with random instance selection	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Performance_comparison_of_feature_selection_and_extraction_methods_with_random_instance_selection.pdf
Size:: 1.28 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Industrial Engineering