Browsing by Subject "K-means"

Now showing 1 - 2 of 2

Open Access
Signal and image processing algorithms for agricultural applications
(2006) Dülek, Berkan
Medical studies indicate that acrylamide causes cancer in animals and certain doses of acrylamide are toxic to the nervous system of both animals and humans. Acrylamide is produced in carbohydrate foods prepared at high temperatures such as fried potatoes. For this reason, it is crucial for human health to quantitatively measure the amount of acrylamide formed as a result of prolonged cooking at high temperatures. In this thesis, a correlation is demonstrated between measured acrylamide concentrations and NABY (Normalized Area of Brownish Yellow regions) values estimated from surface color properties of fried potato images using a modified form of the k-means algorithm. Same method is used to estimate acrylamide levels of roasted coffee beans. The proposed method seems to be a promising approach for the estimation of acrylamide levels and can find applications in industrial systems. The quality and price of hazelnuts are mainly determined by the ratio of shell weight to kernel weight. Due to a number of physiological and physical disorders, hazelnuts may grow without fully developed kernels. We previously proposed a prototype system which detects empty hazelnuts by dropping them onto a steel plate and processing the acoustic signal generated when kernels hit the plate. In that study, feature vectors describing time and frequency nature of the impact sound were extracted from the acoustic signal and classified using Support Vector Machines. In the second part of this thesis, a feature domain post-processing method based on vector median/mean filtering is shown to further increase these classification results.
Open Access
Universal lower bounds and optimal rates: achieving minimax clustering error in sub-exponential mixture models
(ML Research Press, 2024-07-03) Dreveton, Maximilien; Gözeten, Alperen; Grossglauser, Matthias; Thiran, Patrick; Agrawan S., Roth A.
Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through Chernoff information, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.