Performance comparison of feature selection and extraction methods with random instance selection

buir.contributor.authorMalekipirbazari, Milad
buir.contributor.orcidMalekipirbazari, Milad|0000-0002-3212-6498
dc.citation.epage115072-18en_US
dc.citation.spage115072-1en_US
dc.citation.volumeNumber179en_US
dc.contributor.authorMalekipirbazari, Milad
dc.contributor.authorAksakallı, V.
dc.contributor.authorShafqat, W.
dc.contributor.authorEberhard, A.
dc.date.accessioned2022-02-10T12:41:27Z
dc.date.available2022-02-10T12:41:27Z
dc.date.issued2021-10-01
dc.departmentDepartment of Industrial Engineeringen_US
dc.description.abstractIn pattern recognition, irrelevant and redundant features together with a large number of noisy instances in the underlying dataset decrease performance of trained models and make the training process considerably slower, if not practically infeasible. In order to combat this so-called curse of dimensionality, one option is to resort to feature selection (FS) methods designed to select the features that contribute the most to the performance of the model, and one other option is to utilize feature extraction (FE) methods that map the original feature space into a new space with lower dimensionality. These two methods together are called feature reduction (FR) methods. On the other hand, deploying an FR method on a dataset with massive number of instances can become a major challenge, from both memory and run time perspectives, due to the complex numerical computations involved in the process. The research question we consider in this study is rather a simple, yet novel one: do these FR methods really need the whole set of instances (WSI) available for the best performance, or can we achieve similar performance levels with selecting a much smaller random subset of WSI prior to deploying an FR method? In this work, we provide empirical evidence based on comprehensive computational experiments that the answer to this critical research question is in the affirmative. Specifically, with simple random instance selection followed by FR, the amount of data needed for training a classifier can be drastically reduced with minimal impact on classification performance. We also provide recommendations on which FS/ FE method to use in conjunction with which classifier.en_US
dc.embargo.release2023-10-01
dc.identifier.doi10.1016/j.eswa.2021.115072en_US
dc.identifier.issn0957-4174
dc.identifier.urihttp://hdl.handle.net/11693/77239
dc.language.isoEnglishen_US
dc.publisherElsevier Ltden_US
dc.relation.isversionofhttps://doi.org/10.1016/j.eswa.2021.115072en_US
dc.source.titleExpert Systems with Applicationsen_US
dc.subjectExplainable artificial intelligenceen_US
dc.subjectDimension reductionen_US
dc.subjectFeature selectionen_US
dc.subjectFeature extractionen_US
dc.subjectInstance selectionen_US
dc.subjectData preprocessingen_US
dc.titlePerformance comparison of feature selection and extraction methods with random instance selectionen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Performance_comparison_of_feature_selection_and_extraction_methods_with_random_instance_selection.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: