Iterative estimation of Robust Gaussian mixture models in heterogeneous data sets

buir.advisorAksoy, Selim
dc.contributor.authorMercan, Caner
dc.date.accessioned2016-04-18T14:06:46Z
dc.date.available2016-04-18T14:06:46Z
dc.date.copyright2014-07
dc.date.issued2014-07
dc.date.submitted15-07-2015
dc.descriptionIncludes bibliographical references (pages 68-73).en_US
dc.descriptionThesis (M.S.): Bilkent University, The Department of Computer Engineering and the Graduate School of Engineering and Science of, 2014.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.description.abstractDensity estimation is the process of estimating the parameters of a probability density function from data. The Gaussian mixture model (GMM) is one of the most preferred density families. We study the estimation of a Gaussian mixture from a heterogeneous data set that is de ned as the set of points that contains interesting points that are sampled from a mixture of Gaussians as well as non-Gaussian distributed uninteresting ones. The traditional GMM estimation techniques such as the Expectation-Maximization algorithm cannot e ectively model the interesting points in a heterogeneous data set due to their sensitivity to the uninteresting points as outliers. Another potential problem is that the true number of components should often be known a priori for a good estimation. We propose a GMM estimation algorithm that iteratively estimates the number of interesting points, the number of Gaussians in the mixture, and the actual mixture parameters while being robust to the presence of uninteresting points in heterogeneous data. The procedure is designed so that one Gaussian component is estimated using a robust formulation at each iteration. The number of interesting points that belong to this component is also estimated using a multi-resolution search procedure among a set of candidates. If a hypothesis on the Gaussianity of these points is accepted, the estimated Gaussian is kept as a component in the mixture, the associated points are removed from the data set, and the iterations continue with the remaining points. Otherwise, the estimation process is terminated and the remaining points are labeled as uninteresting. Thus, the stopping criterion helps to identify the true number of components without any additional information. Comparative experiments on synthetic and real-world data sets show that our algorithm can identify the true number of components and can produce a better density estimate in terms of log-likelihood compared to two other algorithms.en_US
dc.description.provenanceSubmitted by Dilek Doğanoğlu (doral@bilkent.edu.tr) on 2016-04-18T14:06:46Z No. of bitstreams: 2 CanerMercan_thesis.pdf: 1207911 bytes, checksum: 4a327be7c7892a130362623f3a998d26 (MD5) CanerMercan_thesis.pdf: 1207911 bytes, checksum: 4a327be7c7892a130362623f3a998d26 (MD5)en
dc.description.provenanceMade available in DSpace on 2016-04-18T14:06:46Z (GMT). No. of bitstreams: 2 CanerMercan_thesis.pdf: 1207911 bytes, checksum: 4a327be7c7892a130362623f3a998d26 (MD5) CanerMercan_thesis.pdf: 1207911 bytes, checksum: 4a327be7c7892a130362623f3a998d26 (MD5) Previous issue date: 2014-07en
dc.description.statementofresponsibilityCaner Mercanen_US
dc.format.extentxiv, 73 leaves, charts, graphics.en_US
dc.identifier.itemidB147896
dc.identifier.urihttp://hdl.handle.net/11693/28945
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectGaussian Mixture modelen_US
dc.subjectRobust Gaussian estimationen_US
dc.subjectIdentifying number of mixture componentsen_US
dc.subjectIterative Gaussian mixture estimationen_US
dc.titleIterative estimation of Robust Gaussian mixture models in heterogeneous data setsen_US
dc.title.alternativeGauss karışım modellerinin türdeş olmayan veri öbeklerinde yinelemeli kestirimien_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CanerMercan_thesis.pdf
Size:
1.15 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: