Sequential outlier detection based on incremental decision trees

buir.contributor.authorGökçesu, Kaan
buir.contributor.authorNeyshabouri, Mohammadreza Mohaghegh
buir.contributor.authorGökçesu, Hakan
buir.contributor.authorSerdar, Süleyman
dc.citation.epage1005en_US
dc.citation.issueNumber4en_US
dc.citation.spage993en_US
dc.citation.volumeNumber67en_US
dc.contributor.authorGökçesu, Kaanen_US
dc.contributor.authorNeyshabouri, Mohammadreza Mohagheghen_US
dc.contributor.authorGökçesu, Hakanen_US
dc.contributor.authorSerdar, Süleymanen_US
dc.date.accessioned2020-02-05T06:22:22Z
dc.date.available2020-02-05T06:22:22Z
dc.date.issued2019
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractWe introduce an online outlier detection algorithm to detect outliers in a sequentially observed data stream. For this purpose, we use a two-stage filtering and hedging approach. In the first stage, we construct a multimodal probability density function to model the normal samples. In the second stage, given a new observation, we label it as an anomaly if the value of aforementioned density function is below a specified threshold at the newly observed point. In order to construct our multimodal density function, we use an incremental decision tree to construct a set of subspaces of the observation space. We train a single component density function of the exponential family using the observations, which fall inside each subspace represented on the tree. These single component density functions are then adaptively combined to produce our multimodal density function, which is shown to achieve the performance of the best convex combination of the density functions defined on the subspaces. As we observe more samples, our tree grows and produces more subspaces. As a result, our modeling power increases in time, while mitigating overfitting issues. In order to choose our threshold level to label the observations, we use an adaptive thresholding scheme. We show that our adaptive threshold level achieves the performance of the optimal prefixed threshold level, which knows the observation labels in hindsight. Our algorithm provides significant performance improvements over the state of the art in our wide set of experiments involving both synthetic as well as real data.en_US
dc.description.provenanceSubmitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2020-02-05T06:22:22Z No. of bitstreams: 1 Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf: 3052846 bytes, checksum: 52cd66fb04a464e300b5243bbdcda618 (MD5)en
dc.description.provenanceMade available in DSpace on 2020-02-05T06:22:22Z (GMT). No. of bitstreams: 1 Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf: 3052846 bytes, checksum: 52cd66fb04a464e300b5243bbdcda618 (MD5) Previous issue date: 2019-02-15en
dc.identifier.doi10.1109/TSP.2018.2887406en_US
dc.identifier.eissn1941-0476
dc.identifier.issn1053-587X
dc.identifier.urihttp://hdl.handle.net/11693/53071
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttps://doi.org/10.1109/TSP.2018.2887406en_US
dc.source.titleIEEE Transactions on Signal Processingen_US
dc.subjectAnomaly detectionen_US
dc.subjectExponential familyen_US
dc.subjectOnline learningen_US
dc.subjectMixture-of-expertsen_US
dc.titleSequential outlier detection based on incremental decision treesen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf
Size:
2.91 MB
Format:
Adobe Portable Document Format
Description: