BUIR Repository :: Browsing by Subject "Concept drift"

Browsing by Subject "Concept drift"

Now showing 1 - 12 of 12

Open Access
A broad ensemble learning system for drifting stream classification
(Institute of Electrical and Electronics Engineers, 2023-08-21) Bakhshi, Sepehr; Ghahramanian, Pouya; Bonab, H.; Can, Fazlı
In a data stream environment, classification models must effectively and efficiently handle concept drift. Ensemble methods are widely used for this purpose; however, the ones available in the literature either use a large data chunk to update the model or learn the data one by one. In the former, the model may miss the changes in the data distribution, while in the latter, the model may suffer from inefficiency and instability. To address these issues, we introduce a novel ensemble approach based on the Broad Learning System (BLS), where mini chunks are used at each update. BLS is an effective lightweight neural architecture recently developed for incremental learning. Although it is fast, it requires huge data chunks for effective updates and is unable to handle dynamic changes observed in data streams. Our proposed approach, named Broad Ensemble Learning System (BELS), uses a novel updating method that significantly improves best-in class model accuracy. It employs an ensemble of output layers to address the limitations of BLS and handle drifts. Our model tracks the changes in the accuracy of the ensemble components and reacts to these changes. We present our mathematical derivation of BELS, perform comprehensive experiments with 35 datasets that demonstrate the adaptability of our model to various drift types, and provide its hyperparameter, ablation, and imbalanced dataset performance analysis. The experimental results show that the proposed approach outperforms 10 state-of-the-art baselines, and supplies an overall improvement of 18.59% in terms of average prequential accuracy.
Open Access
BELS: a broad ensemble learning system for data stream classification
(Bilkent University, 2021-12) Bakhshi, Sepehr
Data stream classification has become a major research topic due to the increase in temporal data. One of the biggest hurdles of data stream classification is the development of algorithms that deal with evolving data, also known as concept drifts. As data changes over time, static prediction models lose their validity. Adapting to concept drifts provides more robust and better performing models. The Broad Learning System (BLS) is an effective broad neural architecture recently developed for incremental learning. BLS cannot provide instant response since it requires huge data chunks and is unable to handle concept drifts. We propose a Broad Ensemble Learning System (BELS) for stream classification with concept drift. BELS uses a novel updating method that greatly improves bestin- class model accuracy. It employs a dynamic output ensemble layer to address the limitations of BLS. We present its mathematical derivation, provide comprehensive experiments with 11 datasets that demonstrate the adaptability of our model, including a comparison of our model with BLS, and provide parameter and robustness analysis on several drifting streams, showing that it statistically significantly outperforms seven state-of-the-art baselines. We show that our proposed method improves on average 44% compared to BLS, and 29% compared to other competitive baselines.
Open Access
Concept learning using one-class classifiers for implicit drift detection in evolving data streams
(Springer, 2021-06) Gözüaçık, Ömer; Can, Fazli
Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate a near-limitless volume of data in temporal order. Such data are referred to as data streams, and are generally nonstationary as the characteristics of data evolves over time. This phe nomenon is called concept drift, and is an issue of great importance in the literature, since it makes models obsolete by decreasing their predictive performance. In the presence of concept drift, it is necessary to adapt to change in data to build more robust and efective classifers. Drift detectors are designed to run jointly with classifcation models, updating them when a signifcant change in data distribution is observed. In this paper, we present an implicit (unsupervised) algorithm called One-Class Drift Detector (OCDD), which uses a one-class learner with a sliding window to detect concept drift. We perform a compre hensive evaluation on mostly recent 17 prevalent concept drift detection methods and an adaptive classifer using 13 datasets. The results show that OCDD outperforms the other methods by producing models with better predictive performance on both real-world and synthetic datasets.
Open Access
DynED: dynamic ensemble diversification in data stream classification
(Association for Computing Machinery, 2023-10-23) Abadifard, Soheil; Gheibuni, Sanaz; Bakhshi, Sepehr; Can, Fazlı
Ensemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance predic tion accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diver sity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.
Open Access
Evolving text stream classification with a novel neural ensemble architecture
(Bilkent University, 2022-01) Ghahramanian, Pouya
We study on-the-fly classification of evolving text streams in which the relation between the input data target labels changes over time—i.e. “concept drift”. These variations decrease the model’s performance, as predictions become less accurate over-time and they necessitate a more adaptable system. We introduce Adaptive Neural Ensemble Network (AdaNEN ), a novel ensemble-based neural approach, capable of handling concept drift in text streams. With our novel architecture, we address some of the problems neural models face when exploited for online adaptive learning environments. The problem of evolving text stream classification is relatively unexplored and most existing studies address concept drift detection and handling in numerical streams. We hypothesize that the lack of public and large-scale experimental data could be one reason. To this end, we propose a method based on an existing approach for generating evolving text streams by inducing various types of concept drifts to real-world text datasets. We provide an extensive evaluation of our proposed approach using 12 stateof- the-art baselines and eight datasets. Our experimental results show that our proposed method, AdaNEN, consistently outperforms the existing approaches in terms of predictive performance with conservative efficiency.
Open Access
Goowe : geometrically optimum and online-weighted ensemble classifier for evolving data streams
(Bilkent University, 2016-07) Asl-Bonab, Hamed Rezanejad
Designing adaptive classifiers for an evolving data stream is a challenging task due to its size and dynamically changing nature. Combining individual classifiers in an online setting, the ensemble approach, is one of the well-known solutions. It is possible that a subset of classifiers in the ensemble outperforms others in a timevarying fashion. However, optimum weight assignment for component classifiers is a problem which is not yet fully addressed in online evolving environments. We propose a novel data stream ensemble classifier, called Geometrically Optimum and Online-Weighted Ensemble (GOOWE), which assigns optimum weights to the component classifiers using a sliding window containing the most recent data instances. We map vote scores of individual classifiers and true class labels into a spatial environment. Based on the Euclidean distance between vote scores and ideal-points, and using the linear least squares (LSQ) solution, we present a novel dynamic and online weighting approach. While LSQ is used for batch mode ensemble classifiers, it is the first time that we adapt and use it for online environments by providing a spatial modeling of online ensembles. In order to show the robustness of the proposed algorithm, we use real-world datasets and synthetic data generators using the MOA libraries. We compare our results with 8 state-ofthe- art ensemble classifiers in a comprehensive experimental environment. Our experiments show that GOOWE provides improved reactions to different types of concept drift compared to our baselines. The statistical tests indicate a significant improvement in accuracy, with conservative time and memory requirements.
Open Access
GOOWE: geometrically optimum and online-weighted ensemble classifier for evolving data streams
(Association for Computing Machinery, 2018-01-25) Bonab, H. R.; Can, Fazlı
Designing adaptive classifiers for an evolving data stream is a challenging task due to the data size and its dynamically changing nature. Combining individual classifiers in an online setting, the ensemble approach, is a well-known solution. It is possible that a subset of classifiers in the ensemble outperforms others in a time-varying fashion. However, optimum weight assignment for component classifiers is a problem which is not yet fully addressed in online evolving environments. We propose a novel data stream ensemble classifier, called Geometrically Optimum and Online-Weighted Ensemble (GOOWE), which assigns optimum weights to the component classifiers using a sliding window containing the most recent data instances. We map vote scores of individual classifiers and true class labels into a spatial environment. Based on the Euclidean distance between vote scores and ideal-points, and using the linear least squares (LSQ) solution, we present a novel, dynamic, and online weighting approach. While LSQ is used for batch mode ensemble classifiers, it is the first time that we adapt and use it for online environments by providing a spatial modeling of online ensembles. In order to show the robustness of the proposed algorithm, we use real-world datasets and synthetic data generators using the MOA libraries. First, we analyze the impact of our weighting system on prediction accuracy through two scenarios. Second, we compare GOOWE with 8 state-of-the-art ensemble classifiers in a comprehensive experimental environment. Our experiments show that GOOWE provides improved reactions to different types of concept drift compared to our baselines. The statistical tests indicate a significant improvement in accuracy, with conservative time and memory requirements.
Open Access
Implicit concept drift detection for multi-label data streams
(Bilkent University, 2022-01) Gülcan, Ege Berkay
Many real-world applications adopt multi-label data streams as the need for algo-rithms to deal with rapidly generated data increases. For such streams, changes in data distribution, also known as concept drift, cause the existing classification models to rapidly lose their effectiveness. To assist the classifiers, we propose a novel algorithm called Label Dependency Drift Detector (LD3), an implicit (un-supervised) concept drift detector using label dependencies within the data for multi-label data streams. Our study exploits the dynamic temporal dependencies between labels using a label influence ranking method, which leverages a data fusion algorithm and uses the produced ranking to detect concept drift. LD3 is the first unsupervised concept drift detection algorithm in the multi-label classification problem area. In this study, we perform an extensive evaluation of LD3 by comparing it with 14 prevalent supervised concept drift detection algorithms that we adapt to the problem area using 12 datasets and a baseline classifier. The results show that LD3 provides between 19.8% and 68.6% better predictive performance than comparable detectors on both real-world and synthetic data streams.
Open Access
On-the-fly ensemble classifier pruning in evolving data streams
(Bilkent University, 2019-09) Elbaşı, Sanem
Ensemble pruning is the process of selecting a subset of component classifiers from an ensemble which performs at least as well as the original ensemble while reducing storage and computational costs. Ensemble pruning in data streams is a largely unexplored area of research. It requires analysis of ensemble components as they are running on the stream and differentiation of useful classifiers from redundant ones. We present two on-the-fly ensemble pruning methods; Class-wise Component Ranking-based Pruner (CCRP) and Cover Coefficient-based Pruner (CCP). CCRP aims that the resulting pruned ensemble contains the best performing classifier for each target class and hence, reduces the effects of class imbalance. On the other hand, CCP aims to select components that make misclassification errors on different instances. The conducted experiments on real-world and synthetic data streams demonstrate that different types of ensembles that integrate pruners consume significantly less memory and perform significantly faster without hurting the predictive performance.
Open Access
Unsupervised concept drift detection for multi-label data streams
(Springer, 2022-07-17) Gülcan, Ege Berkay; Can, Fazlı
Many real-world applications adopt multi-label data streams as the need for algorithms to deal with rapidly changing data increases. Changes in data distribution, also known as concept drift, cause existing classification models to rapidly lose their effectiveness. To assist the classifiers, we propose a novel algorithm called Label Dependency Drift Detector (LD3), an unsupervised concept drift detector using label dependencies within the data for multi-label data streams. Our study exploits the dynamic temporal dependencies between labels using a label influence ranking method, which leverages a data fusion algorithm and uses the produced ranking to detect concept drift. LD3 is the first unsupervised concept drift detection algorithm in the multi-label classification problem area. In this study, we perform an extensive evaluation of LD3 by comparing it with 14 prevalent supervised concept drift detection algorithms that we adapt to the problem area using 15 datasets and a baseline classifier. The results show that LD3 provides between 16.9 and 56% better predictive performance than comparable detectors on both real-world and synthetic data streams. © 2022, The Author(s), under exclusive licence to Springer Nature B.V.
Open Access
Unsupervised concept drift detection using sliding windows: two contributions
(Bilkent University, 2020-10) Gözüaçık, Ömer
Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate limitless volume of data in temporal order. Such data are referred to as data streams, and generally, they are nonstationary as the characteristics of the data evolve over time. This phenomenon is called concept drift, and it is an issue of great importance in the literature since it makes models outdated and decreases their predictive performance. In the presence of concept drift, adapting the change in data is necessary to have more robust and effective classifiers. Drift detectors are designed to run jointly with the classification models, updating them when a significant change in the data distribution is observed. In this study, we propose two unsupervised concept drift detection methods: D3 and OCDD. In D3, we use a discriminative classifier over a sliding window to monitor the change in the distribution of data. When the old and the new data are separable with the discriminative classifier, a drift is signaled. In OCDD, we use a one-class classifier over a sliding window. We monitor the number of outliers identified in the sliding window. We claim that the number of outliers are the signs of a new concept, and define concept drift detection as the continuous form of anomaly detection. A drift is signaled if the percentage of the outliers are over a pre-determined threshold. We perform a comprehensive evaluation on the latest and the most prevalent concept drift detectors using 13 datasets. The results show that OCDD outperforms the other methods by producing models with significantly better predictive performances on both real-world and synthetic datasets. D3 is on par with the other methods.
Open Access
Unsupervised concept drift detection with a discriminative classifier
(Association for Computing Machinery, 2019) Gözüaçık, Ömer; Büyükçakır, Alican; Bonab, H.; Can, Fazlı
In data stream mining, one of the biggest challenges is to develop algorithms that deal with the changing data. As data evolve over time, static models become outdated. This phenomenon is called concept drift, and it is investigated extensively in the literature. Detecting and subsequently adapting to concept drifts yield more robust and better performing models. In this study, we present an unsupervised method called D3 which uses a discriminative classifier with a sliding window to detect concept drift by monitoring changes in the feature space. It is a simple method that can be used along with any existing classifier that does not intrinsically have a drift adaptation mechanism. We experiment on the most prevalent concept drift detectors using 8 datasets. The results demonstrate that D3 outperforms the baselines, yielding models with higher performances on both real-world and synthetic datasets.

Browsing by Subject "Concept drift"

Results Per Page

Sort Options