Online learning under adverse settings
Embargo Lift Date: 2016-06-01
Kozat, S. Serdar
Item Usage Stats
MetadataShow full item record
We present novel solutions for contemporary real life applications that generate data at unforeseen rates in unpredictable forms including non-stationarity, corruptions, missing/mixed attributes and high dimensionality. In particular, we introduce novel algorithms for online learning, where the observations are received sequentially and processed only once without being stored, under adverse settings: i) no or limited assumptions can be made about the data source, ii) the observations can be corrupted and iii) the data is to be processed at extremely fast rates. The introduced algorithms are highly effective and efficient with strong mathematical guarantees; and are shown, through the presented comprehensive real life experiments, to significantly outperform the competitors under such adverse conditions. We develop a novel highly dynamical ensemble method without any stochastic assumptions on the data source. The presented method is asymptotically guaranteed to perform as well as, i.e., competitive against, the best expert in the ensemble, where the competitor, i.e., the best expert, itself is also specifically designed to continuously improve over time in a completely data adaptive manner. In addition, our algorithm achieves a significantly superior modeling power (hence, a significantly superior prediction performance) through a hierarchical and self-organizing approach while mitigating over training issues by combining (taking finite unions of) low-complexity methods. On the contrary, the state-of-the-art ensemble techniques are heavily dependent on static and unstructured expert ensembles. In this regard, we rigorously solve the resulting issues such as the over sensitivity to source statistics as well as the incompatibility between the modeling power and the computational load/precision. Our results uniformly hold for every possible input stream in the deterministic sense regardless of the stationary or non-stationary source statistics. Furthermore, we directly address the data corruptions by developing novel versatile imputation methods and thoroughly demonstrate that the anomaly detection -in addition to being stand alone an important learning problem- is extremely effective for corruption detection/imputation purposes. To that end, as the first time in the literature, we develop the online implementation of the Neyman-Pearson characterization for anomalies in stationary or non-stationary fast streaming temporal data. The introduced anomaly detection algorithm maximizes the detection power at a specified controllable constant false alarm rate with no parameter tuning in a truly online manner. Our algorithms can process any streaming data at extremely fast rates without requiring a training phase or a priori information while bearing strong performance guarantees. Through extensive experiments over real/synthetic benchmark data sets, we also show that our algorithms significantly outperform the state-of-the-art as well as the most recently proposed techniques in the literature with remarkable adaptation capabilities to non-stationarity.
Hidden markov model
Partially observable HMM states