dc.contributor.advisor | Aksoy, Selim | |
dc.contributor.author | Mercan, Caner | |
dc.date.accessioned | 2016-04-18T14:06:46Z | |
dc.date.available | 2016-04-18T14:06:46Z | |
dc.date.copyright | 2014-07 | |
dc.date.issued | 2014-07 | |
dc.date.submitted | 15-07-2015 | |
dc.identifier.uri | http://hdl.handle.net/11693/28945 | |
dc.description | Includes bibliographical references (pages 68-73). | en_US |
dc.description | Thesis (M.S.): Bilkent University, The Department of Computer Engineering and the Graduate School of Engineering and Science of, 2014. | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description.abstract | Density estimation is the process of estimating the parameters of a probability
density function from data. The Gaussian mixture model (GMM) is one of the
most preferred density families. We study the estimation of a Gaussian mixture
from a heterogeneous data set that is de ned as the set of points that contains
interesting points that are sampled from a mixture of Gaussians as well as
non-Gaussian distributed uninteresting ones. The traditional GMM estimation
techniques such as the Expectation-Maximization algorithm cannot e ectively
model the interesting points in a heterogeneous data set due to their sensitivity
to the uninteresting points as outliers. Another potential problem is that the
true number of components should often be known a priori for a good estimation.
We propose a GMM estimation algorithm that iteratively estimates the
number of interesting points, the number of Gaussians in the mixture, and the
actual mixture parameters while being robust to the presence of uninteresting
points in heterogeneous data. The procedure is designed so that one Gaussian
component is estimated using a robust formulation at each iteration. The number
of interesting points that belong to this component is also estimated using a
multi-resolution search procedure among a set of candidates. If a hypothesis on
the Gaussianity of these points is accepted, the estimated Gaussian is kept as a
component in the mixture, the associated points are removed from the data set,
and the iterations continue with the remaining points. Otherwise, the estimation
process is terminated and the remaining points are labeled as uninteresting.
Thus, the stopping criterion helps to identify the true number of components
without any additional information. Comparative experiments on synthetic and
real-world data sets show that our algorithm can identify the true number of
components and can produce a better density estimate in terms of log-likelihood
compared to two other algorithms. | en_US |
dc.description.statementofresponsibility | Caner Mercan | en_US |
dc.format.extent | xiv, 73 leaves, charts, graphics. | en_US |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Gaussian Mixture model | en_US |
dc.subject | Robust Gaussian estimation | en_US |
dc.subject | Identifying number of mixture components | en_US |
dc.subject | Iterative Gaussian mixture estimation | en_US |
dc.title | Iterative estimation of Robust Gaussian mixture models in heterogeneous data sets | en_US |
dc.title.alternative | Gauss karışım modellerinin türdeş olmayan veri öbeklerinde yinelemeli kestirimi | en_US |
dc.type | Thesis | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.publisher | Bilkent University | en_US |
dc.description.degree | M.S. | en_US |
dc.identifier.itemid | B147896 | |