Data imputation through the identification of local anomalies

dc.citation.epage2395en_US
dc.citation.issueNumber10en_US
dc.citation.spage2381en_US
dc.citation.volumeNumber26en_US
dc.contributor.authorOzkan, H.en_US
dc.contributor.authorPelvan, O. S.en_US
dc.contributor.authorKozat, S. S.en_US
dc.date.accessioned2016-02-08T09:37:07Z
dc.date.available2016-02-08T09:37:07Z
dc.date.issued2015en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractWe introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose: 1) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and 2) a maximum a posteriori estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous versus normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions. © 2015 IEEE.en_US
dc.identifier.doi10.1109/TNNLS.2014.2382606en_US
dc.identifier.issn2162-237X
dc.identifier.urihttp://hdl.handle.net/11693/20881
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TNNLS.2014.2382606en_US
dc.source.titleIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.subjectAnomaly detectionen_US
dc.subjectlocalized corruptionen_US
dc.subjectAlgorithmsen_US
dc.subjectArtificial intelligenceen_US
dc.subjectBinary treesen_US
dc.subjectClassification (of information)en_US
dc.subjectCrimeen_US
dc.subjectForestryen_US
dc.subjectIterative methodsen_US
dc.subjectLearning systemsen_US
dc.subjectStatistical testsen_US
dc.subjectAnomaly detectionen_US
dc.subjectEuclidean distanceen_US
dc.subjectlocalized corruptionen_US
dc.subjectMaximum a posteriorien_US
dc.subjectMaximum a Posteriori Estimatoren_US
dc.subjectocclusionen_US
dc.subjectParameter-tuningen_US
dc.subjectStatistical frameworken_US
dc.subjectTrees (mathematics)en_US
dc.titleData imputation through the identification of local anomaliesen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Data Imputation Through the Identification of Local Anomalies.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format
Description:
Full printable version