Browsing by Subject "Anomaly detection"
Now showing 1 - 20 of 27
- Results Per Page
- Sort Options
Item Open Access Akut koroner sendromun destek vektör makinelerine ve EKG’ye dayalı tespiti(IEEE, 2019-04) Terzi, Merve Begüm; Arıkan, OrhanAkut koroner sendroma (AKS) sahip hastalarda, miyokard infarktüsü başlangıcından önce geçici göğüs ağrıları ile birlikte EKG sinyalinin ST segmentinde ve T dalgasında değişiklikler meydana gelmektedir. Bu çalışmada, AKS’nin gürbüz tespitini gerçekleştirmek amacıyla, EKG sinyalinin ST segmentindeki ve T dalgasındaki anomalileri güncel sinyal işleme ve makine öğrenmesi tekniklerini kullanarak tespit eden bir teknik geliştirilmiştir. Bu amaçla, STAFF III veri tabanındaki geniş bantlı kayıtlar kullanılarak, AKS’nin teşhisi için ayırıcılığı en yüksek olan EKG özniteliklerini elde eden özgün bir öznitelik çıkarım tekniği geliştirilmiştir. Elde edilen kritik öznitelikleri kullanarak, AKS’nin gürbüz tespitini gerçekleştiren destek vektör makinelerine (DVM) ve çekirdek fonksiyonlarına dayalı bir gözetimli öğrenme tekniği geliştirilmiştir. Önerilen tekniğin STAFF III veri tabanındaki kayda değer sayıda hastadan elde edilen başarım sonuçları, tekniğin oldukça güvenilir AKS tespiti sağladığını göstermektedir.Item Open Access Anomaly detection with sparse unmixing and gaussian mixture modeling of hyperspectral images(2015-07) Erdinç, AcarOne of the main applications of hyperspectral image analysis is anomaly detection where the problem of interest is the detection of small rare objects that stand out from their surroundings. A common approach to anomaly detection is to rst model the background scene and then to use a detector that quanti es the di erence of a particular pixel from this background. However, identifying the dominant background components and modeling them is a challenging task. We propose an anomaly detection framework that uses Gaussian mixture models for characterizing the scene background in hyperspectral images. First, the full spectrum is divided into several contiguous band groups for dimensionality reduction as well as for exploiting the peculiarities of di erent parts of the spectrum. Then, sparse spectral unmixing is performed for each band group for identifying signi cant endmembers in the scene. Three methods for identifying the dominant background groups such as thresholding, hierarchical clustering and biclustering are used in the endmember abundance space to retrieve the sets of pixel groups that represent dominant background components. Next, these pixel groups are used for initializing individual Gaussian mixture models that are estimated separately for each spectral band group. The proposed method enables automatic identi cation of the number of mixture components and e ective initialization of the estimation procedure for the mixture model. Finally, the Gaussian mixture models for all groups are statistically fused for obtaining the nal anomaly map for the scene. Comparative experiments showed that the proposed methods performed better than two other density-based anomaly detectors, especially for small false positive rates, on an airborne hyperspectral data set.Item Open Access Artificial intelligence-based hybrid anomaly detection and clinical decision support techniques for automated detection of cardiovascular diseases and Covid-19(2023-10) Terzi, Merve BegümCoronary artery diseases are the leading cause of death worldwide, and early diagnosis is crucial for timely treatment. To address this, we present a novel automated arti cial intelligence-based hybrid anomaly detection technique com posed of various signal processing, feature extraction, supervised, and unsuper vised machine learning methods. By jointly and simultaneously analyzing 12-lead electrocardiogram (ECG) and cardiac sympathetic nerve activity (CSNA) data, the automated arti cial intelligence-based hybrid anomaly detection technique performs fast, early, and accurate diagnosis of coronary artery diseases. To develop and evaluate the proposed automated arti cial intelligence-based hybrid anomaly detection technique, we utilized the fully labeled STAFF III and PTBD databases, which contain 12-lead wideband raw recordings non invasively acquired from 260 subjects. Using the wideband raw recordings in these databases, we developed a signal processing technique that simultaneously detects the 12-lead ECG and CSNA signals of all subjects. Subsequently, using the pre-processed 12-lead ECG and CSNA signals, we developed a time-domain feature extraction technique that extracts the statistical CSNA and ECG features critical for the reliable diagnosis of coronary artery diseases. Using the extracted discriminative features, we developed a supervised classi cation technique based on arti cial neural networks that simultaneously detects anomalies in the 12-lead ECG and CSNA data. Furthermore, we developed an unsupervised clustering technique based on the Gaussian mixture model and Neyman-Pearson criterion that performs robust detection of the outliers corresponding to coronary artery diseases. By using the automated arti cial intelligence-based hybrid anomaly detection technique, we have demonstrated a signi cant association between the increase in the amplitude of CSNA signal and anomalies in ECG signal during coronary artery diseases. The automated arti cial intelligence-based hybrid anomaly de tection technique performed highly reliable detection of coronary artery diseases with a sensitivity of 98.48%, speci city of 97.73%, accuracy of 98.11%, positive predictive value (PPV) of 97.74%, negative predictive value (NPV) of 98.47%, and F1-score of 98.11%. Hence, the arti cial intelligence-based hybrid anomaly detection technique has superior performance compared to the gold standard diagnostic test ECG in diagnosing coronary artery diseases. Additionally, it out performed other techniques developed in this study that separately utilize either only CSNA data or only ECG data. Therefore, it signi cantly increases the detec tion performance of coronary artery diseases by taking advantage of the diversity in di erent data types and leveraging their strengths. Furthermore, its perfor mance is comparatively better than that of most previously proposed machine and deep learning methods that exclusively used ECG data to diagnose or clas sify coronary artery diseases. It also has a very short implementation time, which is highly desirable for real-time detection of coronary artery diseases in clinical practice. The proposed automated arti cial intelligence-based hybrid anomaly detection technique may serve as an e cient decision-support system to increase physicians' success in achieving fast, early, and accurate diagnosis of coronary artery diseases. It may be highly bene cial and valuable, particularly for asymptomatic coronary artery disease patients, for whom the diagnostic information provided by ECG alone is not su cient to reliably diagnose the disease. Hence, it may signi cantly improve patient outcomes, enable timely treatments, and reduce the mortality associated with cardiovascular diseases. Secondly, we propose a new automated arti cial intelligence-based hybrid clinical decision support technique that jointly analyzes reverse transcriptase polymerase chain reaction (RT-PCR) curves, thorax computed tomography im ages, and laboratory data to perform fast and accurate diagnosis of Coronavirus disease 2019 (COVID-19). For this purpose, we retrospectively created the fully labeled Ankara University Faculty of Medicine COVID-19 (AUFM-CoV) database, which contains a wide variety of medical data, including RT-PCR curves, thorax computed tomogra phy images, and laboratory data. The AUFM-CoV is the most comprehensive database that includes thorax computed tomography images of COVID-19 pneu monia (CVP), other viral and bacterial pneumonias (VBP), and parenchymal lung diseases (PLD), all of which present signi cant challenges for di erential diagnosis. We developed a new automated arti cial intelligence-based hybrid clinical de cision support technique, which is an ensemble learning technique consisting of two preprocessing methods, long short-term memory network-based deep learning method, convolutional neural network-based deep learning method, and arti cial neural network-based machine learning method. By jointly analyzing RT-PCR curves, thorax computed tomography images, and laboratory data, the proposed automated arti cial intelligence-based hybrid clinical decision support technique bene ts from the diversity in di erent data types that are critical for the reliable detection of COVID-19 and leverages their strengths. The multi-class classi cation performance results of the proposed convolu tional neural network-based deep learning method on the AUFM-CoV database showed that it achieved highly reliable detection of COVID-19 with a sensitivity of 91.9%, speci city of 92.5%, precision of 80.4%, and F1-score of 86%. There fore, it outperformed thorax computed tomography in terms of the speci city of COVID-19 diagnosis. Moreover, the convolutional neural network-based deep learning method has been shown to very successfully distinguish COVID-19 pneumonia (CVP) from other viral and bacterial pneumonias (VBP) and parenchymal lung diseases (PLD), which exhibit very similar radiological ndings. Therefore, it has great potential to be successfully used in the di erential diagnosis of pulmonary dis eases containing ground-glass opacities. The binary classi cation performance results of the proposed convolutional neural network-based deep learning method showed that it achieved a sensitivity of 91.5%, speci city of 94.8%, precision of 85.6%, and F1-score of 88.4% in diagnosing COVID-19. Hence, it has compara ble sensitivity to thorax computed tomography in diagnosing COVID-19. Additionally, the binary classi cation performance results of the proposed long short-term memory network-based deep learning method on the AUFM-CoV database showed that it performed highly reliable detection of COVID-19 with a sensitivity of 96.6%, speci city of 99.2%, precision of 98.1%, and F1-score of 97.3%. Thus, it outperformed the gold standard RT-PCR test in terms of the sensitivity of COVID-19 diagnosis Furthermore, the multi-class classi cation performance results of the proposed automated arti cial intelligence-based hybrid clinical decision support technique on the AUFM-CoV database showed that it diagnosed COVID-19 with a sen sitivity of 66.3%, speci city of 94.9%, precision of 80%, and F1-score of 73%. Hence, it has been shown to very successfully perform the di erential diagnosis of COVID-19 pneumonia (CVP) and other pneumonias. The binary classi cation performance results of the automated arti cial intelligence-based hybrid clinical decision support technique revealed that it diagnosed COVID-19 with a sensi tivity of 90%, speci city of 92.8%, precision of 91.8%, and F1-score of 90.9%. Therefore, it exhibits superior sensitivity and speci city compared to laboratory data in COVID-19 diagnosis. The performance results of the proposed automated arti cial intelligence-based hybrid clinical decision support technique on the AUFM-CoV database demon strate its ability to provide highly reliable diagnosis of COVID-19 by jointly ana lyzing RT-PCR data, thorax computed tomography images, and laboratory data. Consequently, it may signi cantly increase the success of physicians in diagnosing COVID-19, assist them in rapidly isolating and treating COVID-19 patients, and reduce their workload in daily clinical practice.Item Open Access Client-specific anomaly detection for face presentation attack detection(Elsevier, 2020) Fatemifar, S.; Arashloo, Shervin Rahimzadeh; Awais, M.; Kittler, J.One-class anomaly detection approaches are particularly appealing for use in face presentation attack detection (PAD), especially in an unseen attack scenario, where the system is exposed to novel types of attacks. This work builds upon an anomaly-based formulation of the problem and analyses the merits of deploying client-specific information for face spoofing detection. We propose training one-class client-specific classifiers (both generative and discriminative) using representations obtained from pre-trained deep Convolutional Neural Networks (CNN). In order to incorporate client-specific information, a distinct threshold is set for each client based on subject-specific score distributions, which is then used for decision making at the test time. Through extensive experiments using different one-class systems, it is shown that the use of client-specific information in a one-class anomaly detection formulation (both in model construction as well as decision boundary selection) improves the performance significantly. We also show that anomaly-based solutions have the capacity to perform as well or better than two-class approaches in the unseen attack scenarios. Moreover, it is shown that CNN features obtained from models trained for face recognition appear to discard discriminative traits for spoofing detection and are less capable for PAD compared to the CNNs trained for a generic object recognition task.Item Open Access Data imputation through the identification of local anomalies(Institute of Electrical and Electronics Engineers Inc., 2015) Ozkan, H.; Pelvan, O. S.; Kozat, S. S.We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. © 2015 IEEE.Item Open Access Detection of cardiac arrhythmia using autonomic nervous system, Gaussian mixture model and artificial neural network(Institute of Electrical and Electronics Engineers, 2020) Terzi, Merve Begüm; Arıkan, OrhanIn this study, a new technique which detects anomalies in skin sympathetic nerve activity (SKNA) by using state-of-the-art signal processing and machine learning methods is developed to perform the robust detection of cardiac arrhythmia (CA). For this purpose, a signal processing technique that simultaneously obtains SKNA and ECG from wideband recordings on MIT-BIH database is developed. By using preprocessed data, a novel feature extraction technique which obtains SKNA features that are critical for the reliable detection of CA is developed. By using extracted features, a supervised learning technique based on artificial neural network (ANN) and an unsupervised learning technique based on Gaussian mixture model (GMM) are developed to perform the robust detection of SKNA anomalies. A Neyman-Pearson type of approach is developed to perform the robust detection of outliers that correspond to CA. The performance results of the proposed technique over MIT-BIH database showed that the technique provides highly reliable detection of CA by performing the robust detection of SKNA anomalies. Therefore, in cases where the diagnostic information of ECG is not sufficient for the reliable diagnosis of CA, the proposed technique can provide early diagnosis of the disease, which can lead to a significant reduction in the mortality rates of cardiovascular diseases.Item Open Access Detection of myocardial infarction using autonomic nervous system, Gaussian mixture model and artificial neural network(Institute of Electrical and Electronics Engineers, 2020) Terzi, Merve Begüm; Arıkan, OrhanIn this study, a new technique which detects anomalies in skin sympathetic nerve activity (SKNA) and ECG by using state-of- the-art signal processing and machine learning methods is developed to perform the robust detection of myocardial infarction (MI). For this purpose, a signal processing technique that simultaneously obtains SKNA and ECG from wideband recordings on PTB-EKG database is developed. By using preprocessed data, a novel feature extraction technique which obtains SKNA features that are critical for the reliable detection of MI is developed. By using extracted features, a supervised learning technique based on artificial neural network (ANN) and an unsupervised learning technique based on Gaussian mixture model (GMM) are developed to perform the robust detection of SKNA anomalies. A Neyman-Pearson type of approach is developed to perform the robust detection of outliers that correspond to MI. The performance results of the proposed technique over PTB-EKG database showed that the technique provides highly reliable detection of MI by performing the robust detection of SKNA anomalies. Therefore, in cases where the diagnostic information of ECG is not sufficient for the reliable diagnosis of MI, the proposed technique can provide early diagnosis of the disease, which can lead to a significant reduction in the mortality rates of cardiovascular diseases.Item Open Access Efficient NP tests for anomaly detection over birth-death type DTMCs(Springer New York LLC, 2018) Özkan, H.; Özkan, F.; Delibalta, I.; Kozat, Süleyman S.We propose computationally highly efficient Neyman-Pearson (NP) tests for anomaly detection over birth-death type discrete time Markov chains. Instead of relying on extensive Monte Carlo simulations (as in the case of the baseline NP), we directly approximate the log-likelihood density to match the desired false alarm rate; and therefore obtain our efficient implementations. The proposed algorithms are appropriate for processing large scale data in online applications with real time false alarm rate controllability. Since we do not require parameter tuning, our algorithms are also adaptive to non-stationarity in the data source. In our experiments, the proposed tests demonstrate superior detection power compared to the baseline NP while nearly achieving the desired rates with negligible computational resources.Item Open Access Flexible test-bed for unusual behavior detection(ACM, 2007-07) Petrás I.; Beleznai, C.; Dedeolğu, Yiğithan; Pards, M.; Kovács L.; Szlávik, Z.; Havasi L.; Szirányi, T.; Töreyin, B. Uğur; Güdükbay, Uğur; Çetin, A.hmet Enis; Canton-Ferrer, C.Visual surveillance and activity analysis is an active research field of computer vision. As a result, there are several different algorithms produced for this purpose. To obtain more robust systems it is desirable to integrate the different algorithms. To help achieve this goal, we propose a flexible, distributed software collaboration framework and present a prototype system for automatic event analysis. Copyright 2007 ACM.Item Open Access Keyframe labeling technique for surveillance event classification(S P I E - International Society for Optical Engineering, 2010) Şaykol, E.; Baştan M.; Güdükbay, Uğur; Ulusoy, ÖzgürThe huge amount of video data generated by surveillance systems necessitates the use of automatic tools for their efficient analysis, indexing, and retrieval. Automated access to the semantic content of surveillance videos to detect anomalous events is among the basic tasks; however, due to the high variability of the audio-visual features and large size of the video input, it still remains a challenging task, though a considerable amount of research dealing with automated access to video surveillance has appeared in the literature. We propose a keyframe labeling technique, especially for indoor environments, which assigns labels to keyframes extracted by a keyframe detection algorithm, and hence transforms the input video to an event-sequence representation. This representation is used to detect unusual behaviors, such as crossover, deposit, and pickup, with the help of three separate mechanisms based on finite state automata. The keyframes are detected based on a grid-based motion representation of the moving regions, called the motion appearance mask. It has been shown through performance experiments that the keyframe labeling algorithm significantly reduces the storage requirements and yields reasonable event detection and classification performance. © 2010 Society of Photo-Optical Instrumentation Engineers.Item Open Access Koroner arter hastalığının destek vektör makineleri ve Gauss karışım modeli ile tespiti(IEEE, 2019-04) Terzi, Merve Begüm; Arıkan, OrhanBu çalışmada, koroner arter hastalığının (KAH) gürbüz tespitini gerçekleştirmek amacıyla EKG’deki anomalileri güncel sinyal işleme ve makine ögrenmesi yöntemlerini kullanarak tespit eden bir teknik geliştirilmiştir. Bu amaçla, European ST-T veri tabanındaki geniş bantlı kayıtlar kullanılarak, KAH’ın güvenilir tespiti için kritik olan EKG özniteliklerini elde eden özgün bir öznitelik çıkarım tekniği geliştirilmiştir. Elde edilen öznitelikleri kullanarak, KAH’ın gürbüz tespitini gerçekleştiren destek vektör makinelerine (DVM) ve çekirdek fonksiyonlarına dayalı bir gözetimli öğrenme tekniği geliştirilmiştir. İskemik EKG verilerinin eksik olduğu durumlarda, sadece bazal EKG verilerini kullanarak KAH’ın gürbüz tespitini gerçekleştiren Gauss karışım modeline (GKM) dayalı bir gözetimsiz ögrenme tekniği geliştirilmiştir. KAH’ı temsil eden aykırı değerlerin gürbüz tespitini gerçekleştirmek için Neyman-Pearson tipi bir yaklaşım geliştirilmiştir. Önerilen tekniğin European ST-T veri tabanı üzerindeki başarım sonuçları, tekniğin oldukça güvenilir KAH tespiti sağladığını göstermektedir.Item Open Access A novel anomaly detection approach based on neural networks(Institute of Electrical and Electronics Engineers, 2018) Ergen, Tolga; Kerpiççi, MineIn this paper, we introduce a Long Short Term Memory (LSTM) networks based anomaly detection algorithm, which works in an unsupervised framework. We first introduce LSTM based structure for variable length data sequences to obtain fixed length sequences. Then, we propose One Class Support Vector Machines (OC-SVM) algorithm based scoring function for anomaly detection. For training, we propose a gradient based algorithm to find the optimal parameters for both LSTM architecture and the OC-SVM formulation. Since we modify the original OC-SVM formulation, we also provide the convergence results of the modified formulation to the original one. Thus, the algorithm that we proposed is able to process data with variable length sequences. Also, the algorithm provides high performance for time series data. In our experiments, we illustrate significant performance improvements with respect to the conventional methods.Item Open Access A novel distributed anomaly detection algorithm based on support vector machines(Elsevier, 2020-01) Ergen, Tolga; Kozat, Süleyman S.In this paper, we study anomaly detection in a distributed network of nodes and introduce a novel algorithm based on Support Vector Machines (SVMs). We first reformulate the conventional SVM optimization problem for a distributed network of nodes. We then directly train the parameters of this SVM architecture in its primal form using a gradient based algorithm in a fully distributed manner, i.e., each node in our network is allowed to communicate only with its neighboring nodes in order to train the parameters. Therefore, we not only obtain a high performing anomaly detection algorithm thanks to strong modeling capabilities of SVMs, but also achieve significantly reduced communication load and computational complexity due to our fully distributed and efficient gradient based training. Here, we provide a training algorithm in a supervised framework, however, we also provide the extensions of our implementation to an unsupervised framework. We illustrate the performance gains achieved by our algorithm via several benchmark real life and synthetic experiments.Item Unknown Online anomaly detection in case of limited feedback with accurate distribution learning(IEEE, 2017) Marivani, Iman; Kari, Dariush; Kurt, Ali Emirhan; Manış, ErenWe propose a high-performance algorithm for sequential anomaly detection. The proposed algorithm sequentially runs over data streams, accurately estimates the nominal distribution using exponential family and then declares an anomaly when the assigned likelihood of the current observation is less than a threshold. We use the estimated nominal distribution to assign a likelihood to the current observation and employ limited feedback from the end user to adjust the threshold. The high performance of our algorithm is due to accurate estimation of the nominal distribution, where we achieve this by preventing anomalous data to corrupt the update process. Our method is generic in the sense that it can operate successfully over a wide range of data distributions. We demonstrate the performance of our algorithm with respect to the state-of-the-art over time varying distributions.Item Unknown Online anomaly detection under Markov statistics with controllable type-I error(Institute of Electrical and Electronics Engineers Inc., 2016) Ozkan, H.; Ozkan, F.; Kozat, S. S.We study anomaly detection for fast streaming temporal data with real time Type-I error, i.e., false alarm rate, controllability; and propose a computationally highly efficient online algorithm, which closely achieves a specified false alarm rate while maximizing the detection power. Regardless of whether the source is stationary or nonstationary, the proposed algorithm sequentially receives a time series and learns the nominal attributes - in the online setting - under possibly varying Markov statistics. Then, an anomaly is declared at a time instance, if the observations are statistically sufficiently deviant. Moreover, the proposed algorithm is remarkably versatile since it does not require parameter tuning to match the desired rates even in the case of strong nonstationarity. The presented study is the first to provide the online implementation of Neyman-Pearson (NP) characterization for the problem such that the NP optimality, i.e., maximum detection power at a specified false alarm rate, is nearly achieved in a truly online manner. In this regard, the proposed algorithm is highly novel and appropriate especially for the applications requiring sequential data processing at large scales/high rates due to its parameter-tuning free computational efficient design with the practical NP constraints under stationary or non-stationary source statistics. © 2015 IEEE.Item Unknown Online anomaly detection with bandwidth optimized hierarchical kernel density estimators(IEEE, 2020) Kerpicci, M.; Ozkan, H.; Kozat, Süleyman SerdarWe propose a novel unsupervised anomaly detection algorithm that can work for sequential data from any complex distribution in a truly online framework with mathematically proven strong performance guarantees. First, a partitioning tree is constructed to generate a doubly exponentially large hierarchical class of observation space partitions, and every partition region trains an online kernel density estimator (KDE) with its own unique dynamical bandwidth. At each time, the proposed algorithm optimally combines the class estimators to sequentially produce the final density estimation. We mathematically prove that the proposed algorithm learns the optimal partition with kernel bandwidths that are optimized in both region-specific and time-varying manner. The estimated density is then compared with a data-adaptive threshold to detect anomalies. Overall, the computational complexity is only linear in both the tree depth and data length. In our experiments, we observe significant improvements in anomaly detection accuracy compared with the state-of-the-art techniques.Item Unknown Online anomaly detection with minimax optimal density estimation in nonstationary environments(Institute of Electrical and Electronics Engineers, 2018) Gokcesu, K.; Kozat, Süleyman SerdarWe introduce a truly online anomaly detection algorithm that sequentially processes data to detect anomalies in time series. In anomaly detection, while the anomalous data are arbitrary, the normal data have similarities and generally conforms to a particular model. However, the particular model that generates the normal data is generally unknown (even nonstationary) and needs to be learned sequentially. Therefore, a two stage approach is needed, where in the first stage, we construct a probability density function to model the normal data in the time series. Then, in the second stage, we threshold the density estimation of the newly observed data to detect anomalies. We approach this problem from an information theoretic perspective and propose minimax optimal schemes for both stages to create an optimal anomaly detection algorithm in a strong deterministic sense. To this end, for the first stage, we introduce a completely online density estimation algorithm that is minimax optimal with respect to the log-loss and achieves Merhav's lower bound for general nonstationary exponential-family of distributions without any assumptions on the observation sequence. For the second stage, we propose a threshold selection scheme that is minimax optimal (with logarithmic performance bounds) against the best threshold chosen in hindsight with respect to the surrogate logistic loss. Apart from the regret bounds, through synthetic and real life experiments, we demonstrate substantial performance gains with respect to the state-of-the-art density estimation based anomaly detection algorithms in the literature.Item Unknown Online learning under adverse settings(2015-05) Özkan, HüseyinWe present novel solutions for contemporary real life applications that generate data at unforeseen rates in unpredictable forms including non-stationarity, corruptions, missing/mixed attributes and high dimensionality. In particular, we introduce novel algorithms for online learning, where the observations are received sequentially and processed only once without being stored, under adverse settings: i) no or limited assumptions can be made about the data source, ii) the observations can be corrupted and iii) the data is to be processed at extremely fast rates. The introduced algorithms are highly effective and efficient with strong mathematical guarantees; and are shown, through the presented comprehensive real life experiments, to significantly outperform the competitors under such adverse conditions. We develop a novel highly dynamical ensemble method without any stochastic assumptions on the data source. The presented method is asymptotically guaranteed to perform as well as, i.e., competitive against, the best expert in the ensemble, where the competitor, i.e., the best expert, itself is also specifically designed to continuously improve over time in a completely data adaptive manner. In addition, our algorithm achieves a significantly superior modeling power (hence, a significantly superior prediction performance) through a hierarchical and self-organizing approach while mitigating over training issues by combining (taking finite unions of) low-complexity methods. On the contrary, the state-of-the-art ensemble techniques are heavily dependent on static and unstructured expert ensembles. In this regard, we rigorously solve the resulting issues such as the over sensitivity to source statistics as well as the incompatibility between the modeling power and the computational load/precision. Our results uniformly hold for every possible input stream in the deterministic sense regardless of the stationary or non-stationary source statistics. Furthermore, we directly address the data corruptions by developing novel versatile imputation methods and thoroughly demonstrate that the anomaly detection -in addition to being stand alone an important learning problem- is extremely effective for corruption detection/imputation purposes. To that end, as the first time in the literature, we develop the online implementation of the Neyman-Pearson characterization for anomalies in stationary or non-stationary fast streaming temporal data. The introduced anomaly detection algorithm maximizes the detection power at a specified controllable constant false alarm rate with no parameter tuning in a truly online manner. Our algorithms can process any streaming data at extremely fast rates without requiring a training phase or a priori information while bearing strong performance guarantees. Through extensive experiments over real/synthetic benchmark data sets, we also show that our algorithms significantly outperform the state-of-the-art as well as the most recently proposed techniques in the literature with remarkable adaptation capabilities to non-stationarity.Item Unknown Payload-based network intrusion detection using LSTM autoencoders(2020-12) Coşan, SelinThe increase in the use of computer networks by vast numbers of different devices have allowed malicious entities to develop a plethora of diverse attacks, targeting individuals and businesses. The defence systems need to be kept up to date constantly since new attacks emerge daily, in addition to having a wide range of characteristics. Intrusion detection is a branch of cyber-security that aims to prevent these attacks. Machine learning and deep learning approaches gained popularity in this discipline, as they did in many others such as fraud detection and medicine. Given that network traffic usually displays normal behavior, anomaly detection methods can pinpoint threats by identifying connections with abnormal properties. This task can be accomplished in a supervised or an unsupervised manner. Regardless of the path, constructing meaningful representations of network data is essential. In this thesis, we employ different types of feature extraction methods for computer network data and anomaly detection strategies that can detect malicious behaviour. For the feature extraction task, we aim to obtain vector representations of network payloads such that the core information is more reachable and irrelevant information is discarded. In our setting, the input size can vary due to the nature of the computer network data. Considering this, we use feature extraction methods that can map inputs of varying sizes into feature spaces with fixed dimensionality so that some machine learning approaches, that are otherwise unusable in these settings, can be employed. For the anomaly detection task, we utilize both supervised and unsupervised approaches. The supervised methods make use of the aforementioned feature extraction strategies and use the reduced and fixed dimensional representations of the computer network data. For the unsupervised case, we employ autoencoders that can extract information from sequential data. Recurrent neural networks(RNNs) can process sequential data with varying length. We specifically use autoencoders with long short-term memory(LSTM), which is a special form of RNNs with a more complex structure that allows them to handle long-term dependencies in sequential data. Then, anomaly detection is performed using reconstruction error. We conduct experiments using dynamic and realistic data sets, which consist of various types of attacks. Then, we evaluate the validity of our proposed approaches based on AUC and F1 measures.Item Unknown Qoe evaluation in adaptive streaming enhanced MDT with deep learning(Springer, 2023-03-24) Gökçesu, Hakan; Erçetin, Ö; Kalem, G.; Ergut, S.We propose an architecture for performing virtual drive tests for mobile network performance evaluation by facilitating radio signal strength data from user equipment. Our architecture comprises three main components: (i) pattern recognizer that learns a typical (nominal) behavior for application KPIs (key performance indicators); (ii) predictor that maps from network KPIs to application KPIs; (iii) anomaly detector that compares predicted application performance with said typical pattern. To simulate user-traces, we utilize a commercial state-of-the-art network optimization tool, which collects application and network KPIs at different geographical locations at various times of the day, to train an initial learning model. Although the collected data is related to an adaptive video streaming application, the proposed architecture is flexible, autonomous and can be used for other applications. We perform extensive numerical analysis to demonstrate key parameters impacting video quality prediction and anomaly detection. Playback time is shown to be the most important parameter affecting video quality, most likely due to video packet buffering during playback. We additionally observe that network KPIs, which characterize the cellular connection strength, improve QoE (quality of experience) estimation in anomalous cases diverging from the nominal. The efficacy of our approach is demonstrated with a mean-maximum F1-score of 77%.