Sequential outlier detection based on incremental decision trees

Gökçesu, Kaan; Neyshabouri, Mohammadreza Mohaghegh; Gökçesu, Hakan; Serdar, Süleyman

Sequential outlier detection based on incremental decision trees

buir.contributor.author	Gökçesu, Kaan
buir.contributor.author	Neyshabouri, Mohammadreza Mohaghegh
buir.contributor.author	Gökçesu, Hakan
buir.contributor.author	Serdar, Süleyman
dc.citation.epage	1005	en_US
dc.citation.issueNumber	4	en_US
dc.citation.spage	993	en_US
dc.citation.volumeNumber	67	en_US
dc.contributor.author	Gökçesu, Kaan	en_US
dc.contributor.author	Neyshabouri, Mohammadreza Mohaghegh	en_US
dc.contributor.author	Gökçesu, Hakan	en_US
dc.contributor.author	Serdar, Süleyman	en_US
dc.date.accessioned	2020-02-05T06:22:22Z
dc.date.available	2020-02-05T06:22:22Z
dc.date.issued	2019
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	We introduce an online outlier detection algorithm to detect outliers in a sequentially observed data stream. For this purpose, we use a two-stage filtering and hedging approach. In the first stage, we construct a multimodal probability density function to model the normal samples. In the second stage, given a new observation, we label it as an anomaly if the value of aforementioned density function is below a specified threshold at the newly observed point. In order to construct our multimodal density function, we use an incremental decision tree to construct a set of subspaces of the observation space. We train a single component density function of the exponential family using the observations, which fall inside each subspace represented on the tree. These single component density functions are then adaptively combined to produce our multimodal density function, which is shown to achieve the performance of the best convex combination of the density functions defined on the subspaces. As we observe more samples, our tree grows and produces more subspaces. As a result, our modeling power increases in time, while mitigating overfitting issues. In order to choose our threshold level to label the observations, we use an adaptive thresholding scheme. We show that our adaptive threshold level achieves the performance of the optimal prefixed threshold level, which knows the observation labels in hindsight. Our algorithm provides significant performance improvements over the state of the art in our wide set of experiments involving both synthetic as well as real data.	en_US
dc.description.provenance	Submitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2020-02-05T06:22:22Z No. of bitstreams: 1 Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf: 3052846 bytes, checksum: 52cd66fb04a464e300b5243bbdcda618 (MD5)	en
dc.description.provenance	Made available in DSpace on 2020-02-05T06:22:22Z (GMT). No. of bitstreams: 1 Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf: 3052846 bytes, checksum: 52cd66fb04a464e300b5243bbdcda618 (MD5) Previous issue date: 2019-02-15	en
dc.identifier.doi	10.1109/TSP.2018.2887406	en_US
dc.identifier.eissn	1941-0476
dc.identifier.issn	1053-587X
dc.identifier.uri	http://hdl.handle.net/11693/53071
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	https://doi.org/10.1109/TSP.2018.2887406	en_US
dc.source.title	IEEE Transactions on Signal Processing	en_US
dc.subject	Anomaly detection	en_US
dc.subject	Exponential family	en_US
dc.subject	Online learning	en_US
dc.subject	Mixture-of-experts	en_US
dc.title	Sequential outlier detection based on incremental decision trees	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sequential_Outlier_Detection_Based_on_Incremental_Decision_Trees.pdf
Size:: 2.91 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering