Dynamic ensemble diversification and hash-based undersampling for the classification of multi-class imbalanced data streams

buir.advisorCan, Fazlı
dc.contributor.authorAbadifard, Soheil
dc.date.accessioned2024-08-09T09:08:03Z
dc.date.available2024-08-09T09:08:03Z
dc.date.copyright2024-07
dc.date.issued2024-07
dc.date.submitted2024-08-05
dc.descriptionCataloged from PDF version of article.
dc.descriptionThesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2024.
dc.descriptionIncludes bibliographical references (leaves 56-66).
dc.description.abstractThe classification of imbalanced data streams, which have unequal class distributions, is a key difficulty in machine learning, especially when dealing with multiple classes and concept drift. While binary imbalanced data stream classification tasks have received considerable attention, only a few studies have focused on multi-class imbalanced data streams. Additionally, dealing with the dynamic imbalance ratio is of great importance. This study introduces a novel, robust, and resilient approach to address these challenges by integrating Locality Sensitive Hashing with Random Hyperplane Projections (LSH-RHP) into the Dynamic Ensemble Diversification (DynED) framework. To the best of our knowledge, we present the first application of LSH-RHP for undersampling in the context of imbalanced non-stationary data streams. The proposed method, undersamples majority classes by utilizing LSH-RHP, provides a balanced training set, and improves the ensemble’s prediction accuracy. We conduct comprehensive experiments on 23 real-world and ten semi-synthetic datasets and compare LSHDynED with 15 state-of-the-art methods. The results reveal that LSH-DynED outperforms other approaches in terms of both Kappa and mG-Mean effectiveness measures, demonstrating its capability in dealing with multi-class imbalanced non-stationary data streams. Notably, LSH-DynED performs well in large-scale, high-dimensional datasets with considerable class imbalances and demonstrates adaptation and robustness in real-world circumstances. For the reproducibility of our results, we have made our implementation available on GitHub.
dc.description.provenanceSubmitted by İlknur Sarıkaya (ilknur.sarikaya@bilkent.edu.tr) on 2024-08-09T09:08:03Z No. of bitstreams: 1 B151424.pdf: 4129351 bytes, checksum: 4c8f38c46c16b6d6052c661e934da6ca (MD5)en
dc.description.provenanceMade available in DSpace on 2024-08-09T09:08:03Z (GMT). No. of bitstreams: 1 B151424.pdf: 4129351 bytes, checksum: 4c8f38c46c16b6d6052c661e934da6ca (MD5) Previous issue date: 2024-07en
dc.description.statementofresponsibilityby Soheil Abadifard
dc.embargo.release2025-02-05
dc.format.extentxii, 67 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB151424
dc.identifier.urihttps://hdl.handle.net/11693/115735
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectData stream
dc.subjectClassification
dc.subjectClass imbalance
dc.subjectConcept drift
dc.subjectEnsemble learning
dc.subjectLocality sensitive hashing
dc.titleDynamic ensemble diversification and hash-based undersampling for the classification of multi-class imbalanced data streams
dc.title.alternativeÇok sınıflı dengesiz veri akışlarının sınıflandırılması için dinamik topluluk çeşitlendirme ve kargaşa-tabanlı az örnekleme
dc.typeThesis
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
B151424.pdf
Size:
3.94 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.1 KB
Format:
Item-specific license agreed upon to submission
Description: