Dynamic ensemble diversification and hash-based undersampling for the classification of multi-class imbalanced data streams
buir.advisor | Can, Fazlı | |
dc.contributor.author | Abadifard, Soheil | |
dc.date.accessioned | 2024-08-09T09:08:03Z | |
dc.date.available | 2024-08-09T09:08:03Z | |
dc.date.copyright | 2024-07 | |
dc.date.issued | 2024-07 | |
dc.date.submitted | 2024-08-05 | |
dc.description | Cataloged from PDF version of article. | |
dc.description | Thesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2024. | |
dc.description | Includes bibliographical references (leaves 56-66). | |
dc.description.abstract | The classification of imbalanced data streams, which have unequal class distributions, is a key difficulty in machine learning, especially when dealing with multiple classes and concept drift. While binary imbalanced data stream classification tasks have received considerable attention, only a few studies have focused on multi-class imbalanced data streams. Additionally, dealing with the dynamic imbalance ratio is of great importance. This study introduces a novel, robust, and resilient approach to address these challenges by integrating Locality Sensitive Hashing with Random Hyperplane Projections (LSH-RHP) into the Dynamic Ensemble Diversification (DynED) framework. To the best of our knowledge, we present the first application of LSH-RHP for undersampling in the context of imbalanced non-stationary data streams. The proposed method, undersamples majority classes by utilizing LSH-RHP, provides a balanced training set, and improves the ensemble’s prediction accuracy. We conduct comprehensive experiments on 23 real-world and ten semi-synthetic datasets and compare LSHDynED with 15 state-of-the-art methods. The results reveal that LSH-DynED outperforms other approaches in terms of both Kappa and mG-Mean effectiveness measures, demonstrating its capability in dealing with multi-class imbalanced non-stationary data streams. Notably, LSH-DynED performs well in large-scale, high-dimensional datasets with considerable class imbalances and demonstrates adaptation and robustness in real-world circumstances. For the reproducibility of our results, we have made our implementation available on GitHub. | |
dc.description.provenance | Submitted by İlknur Sarıkaya (ilknur.sarikaya@bilkent.edu.tr) on 2024-08-09T09:08:03Z No. of bitstreams: 1 B151424.pdf: 4129351 bytes, checksum: 4c8f38c46c16b6d6052c661e934da6ca (MD5) | en |
dc.description.provenance | Made available in DSpace on 2024-08-09T09:08:03Z (GMT). No. of bitstreams: 1 B151424.pdf: 4129351 bytes, checksum: 4c8f38c46c16b6d6052c661e934da6ca (MD5) Previous issue date: 2024-07 | en |
dc.description.statementofresponsibility | by Soheil Abadifard | |
dc.embargo.release | 2025-02-05 | |
dc.format.extent | xii, 67 leaves : illustrations, charts ; 30 cm. | |
dc.identifier.itemid | B151424 | |
dc.identifier.uri | https://hdl.handle.net/11693/115735 | |
dc.language.iso | English | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | Data stream | |
dc.subject | Classification | |
dc.subject | Class imbalance | |
dc.subject | Concept drift | |
dc.subject | Ensemble learning | |
dc.subject | Locality sensitive hashing | |
dc.title | Dynamic ensemble diversification and hash-based undersampling for the classification of multi-class imbalanced data streams | |
dc.title.alternative | Çok sınıflı dengesiz veri akışlarının sınıflandırılması için dinamik topluluk çeşitlendirme ve kargaşa-tabanlı az örnekleme | |
dc.type | Thesis | |
thesis.degree.discipline | Computer Engineering | |
thesis.degree.grantor | Bilkent University | |
thesis.degree.level | Master's | |
thesis.degree.name | MS (Master of Science) |