Unsupervised concept drift detection using sliding windows: two contributions
Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate limitless volume of data in temporal order. Such data are referred to as data streams, and generally, they are nonstationary as the characteristics of the data evolve over time. This phenomenon is called concept drift, and it is an issue of great importance in the literature since it makes models outdated and decreases their predictive performance. In the presence of concept drift, adapting the change in data is necessary to have more robust and effective classifiers. Drift detectors are designed to run jointly with the classification models, updating them when a significant change in the data distribution is observed. In this study, we propose two unsupervised concept drift detection methods: D3 and OCDD. In D3, we use a discriminative classifier over a sliding window to monitor the change in the distribution of data. When the old and the new data are separable with the discriminative classifier, a drift is signaled. In OCDD, we use a one-class classifier over a sliding window. We monitor the number of outliers identified in the sliding window. We claim that the number of outliers are the signs of a new concept, and define concept drift detection as the continuous form of anomaly detection. A drift is signaled if the percentage of the outliers are over a pre-determined threshold. We perform a comprehensive evaluation on the latest and the most prevalent concept drift detectors using 13 datasets. The results show that OCDD outperforms the other methods by producing models with significantly better predictive performances on both real-world and synthetic datasets. D3 is on par with the other methods.