Data imputation through the identification of local anomalies

Ozkan, H.; Pelvan, O. S.; Kozat, S. S.

Data imputation through the identification of local anomalies

dc.citation.epage	2395	en_US
dc.citation.issueNumber	10	en_US
dc.citation.spage	2381	en_US
dc.citation.volumeNumber	26	en_US
dc.contributor.author	Ozkan, H.	en_US
dc.contributor.author	Pelvan, O. S.	en_US
dc.contributor.author	Kozat, S. S.	en_US
dc.date.accessioned	2016-02-08T09:37:07Z
dc.date.available	2016-02-08T09:37:07Z
dc.date.issued	2015	en_US
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. © 2015 IEEE.	en_US
dc.identifier.doi	10.1109/TNNLS.2014.2382606	en_US
dc.identifier.issn	2162-237X
dc.identifier.uri	http://hdl.handle.net/11693/20881
dc.language.iso	English	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/TNNLS.2014.2382606	en_US
dc.source.title	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.subject	Anomaly detection	en_US
dc.subject	localized corruption	en_US
dc.subject	Algorithms	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Binary trees	en_US
dc.subject	Classification (of information)	en_US
dc.subject	Crime	en_US
dc.subject	Forestry	en_US
dc.subject	Iterative methods	en_US
dc.subject	Learning systems	en_US
dc.subject	Statistical tests	en_US
dc.subject	Anomaly detection	en_US
dc.subject	Euclidean distance	en_US
dc.subject	localized corruption	en_US
dc.subject	Maximum a posteriori	en_US
dc.subject	Maximum a Posteriori Estimator	en_US
dc.subject	occlusion	en_US
dc.subject	Parameter-tuning	en_US
dc.subject	Statistical framework	en_US
dc.subject	Trees (mathematics)	en_US
dc.title	Data imputation through the identification of local anomalies	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Data Imputation Through the Identification of Local Anomalies.pdf
Size:: 1.77 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Department of Electrical and Electronics Engineering