Online learning under adverse settings
buir.advisor | Kozat, S. Serdar | |
dc.contributor.author | Özkan, Hüseyin | |
dc.date.accessioned | 2016-05-02T07:12:24Z | |
dc.date.available | 2016-05-02T07:12:24Z | |
dc.date.copyright | 2015-05 | |
dc.date.issued | 2015-05 | |
dc.date.submitted | 01-06-2015 | |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (leaves 145-164). | en_US |
dc.description | Thesis (Ph. D.): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2015. | en_US |
dc.description.abstract | We present novel solutions for contemporary real life applications that generate data at unforeseen rates in unpredictable forms including non-stationarity, corruptions, missing/mixed attributes and high dimensionality. In particular, we introduce novel algorithms for online learning, where the observations are received sequentially and processed only once without being stored, under adverse settings: i) no or limited assumptions can be made about the data source, ii) the observations can be corrupted and iii) the data is to be processed at extremely fast rates. The introduced algorithms are highly effective and efficient with strong mathematical guarantees; and are shown, through the presented comprehensive real life experiments, to significantly outperform the competitors under such adverse conditions. We develop a novel highly dynamical ensemble method without any stochastic assumptions on the data source. The presented method is asymptotically guaranteed to perform as well as, i.e., competitive against, the best expert in the ensemble, where the competitor, i.e., the best expert, itself is also specifically designed to continuously improve over time in a completely data adaptive manner. In addition, our algorithm achieves a significantly superior modeling power (hence, a significantly superior prediction performance) through a hierarchical and self-organizing approach while mitigating over training issues by combining (taking finite unions of) low-complexity methods. On the contrary, the state-of-the-art ensemble techniques are heavily dependent on static and unstructured expert ensembles. In this regard, we rigorously solve the resulting issues such as the over sensitivity to source statistics as well as the incompatibility between the modeling power and the computational load/precision. Our results uniformly hold for every possible input stream in the deterministic sense regardless of the stationary or non-stationary source statistics. Furthermore, we directly address the data corruptions by developing novel versatile imputation methods and thoroughly demonstrate that the anomaly detection -in addition to being stand alone an important learning problem- is extremely effective for corruption detection/imputation purposes. To that end, as the first time in the literature, we develop the online implementation of the Neyman-Pearson characterization for anomalies in stationary or non-stationary fast streaming temporal data. The introduced anomaly detection algorithm maximizes the detection power at a specified controllable constant false alarm rate with no parameter tuning in a truly online manner. Our algorithms can process any streaming data at extremely fast rates without requiring a training phase or a priori information while bearing strong performance guarantees. Through extensive experiments over real/synthetic benchmark data sets, we also show that our algorithms significantly outperform the state-of-the-art as well as the most recently proposed techniques in the literature with remarkable adaptation capabilities to non-stationarity. | en_US |
dc.description.provenance | Submitted by Betül Özen (ozen@bilkent.edu.tr) on 2016-05-02T07:12:24Z No. of bitstreams: 1 thesis.pdf: 3505007 bytes, checksum: ad0f438ba0b866787e7d9f62174d2f03 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2016-05-02T07:12:24Z (GMT). No. of bitstreams: 1 thesis.pdf: 3505007 bytes, checksum: ad0f438ba0b866787e7d9f62174d2f03 (MD5) Previous issue date: 2015-05 | en |
dc.description.statementofresponsibility | by Hüseyin Özkan. | en_US |
dc.embargo.release | 2016-06-01 | |
dc.format.extent | xvi, 164 leaves : charts. | en_US |
dc.identifier.itemid | B150302 | |
dc.identifier.uri | http://hdl.handle.net/11693/29022 | |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Online Learning | en_US |
dc.subject | Supervised learning | en_US |
dc.subject | Prediction | en_US |
dc.subject | Classification | en_US |
dc.subject | Regression | en_US |
dc.subject | Anomaly detection | en_US |
dc.subject | Big data | en_US |
dc.subject | Adverse conditions | en_US |
dc.subject | Deterministic analysis | en_US |
dc.subject | Worst case | en_US |
dc.subject | Non-stationarity | en_US |
dc.subject | Concept change | en_US |
dc.subject | Self-organizing | en_US |
dc.subject | Decision tree | en_US |
dc.subject | Hidden markov model | en_US |
dc.subject | HMM | en_US |
dc.subject | Partially observable HMM states | en_US |
dc.subject | Label errors | en_US |
dc.subject | Corruption | en_US |
dc.subject | Noise | en_US |
dc.subject | Anomaly | en_US |
dc.subject | Imputation | en_US |
dc.subject | Time series | en_US |
dc.subject | Neyman-pearson | en_US |
dc.title | Online learning under adverse settings | en_US |
dc.title.alternative | Karşıt koşullar altında çevrimiçi öğrenme | en_US |
dc.type | Thesis | en_US |
thesis.degree.discipline | Electrical and Electronic Engineering | |
thesis.degree.grantor | Bilkent University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Ph.D. (Doctor of Philosophy) |