Data imputation through the identification of local anomalies

Ozkan, H.; Pelvan, O. S.; Kozat, S. S.

Data imputation through the identification of local anomalies

Files

Data Imputation Through the Identification of Local Anomalies.pdf (1.77 MB)

Date

2015

Authors

Ozkan, H.

Pelvan, O. S.

Kozat, S. S.

BUIR Usage Stats

0
views

21
downloads

Citation Stats

Abstract

We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. © 2015 IEEE.

Source Title

IEEE Transactions on Neural Networks and Learning Systems

Publisher

Institute of Electrical and Electronics Engineers Inc.

Permalink

http://hdl.handle.net/11693/20881

Published Version (Please cite this version)

http://dx.doi.org/10.1109/TNNLS.2014.2382606

Collections

Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Article

Full item page

Data imputation through the identification of local anomalies

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Data imputation through the identification of local anomalies

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type