Universal lower bounds and optimal rates: achieving minimax clustering error in sub-exponential mixture models

buir.contributor.authorGözeten, Alperen
dc.citation.epage1485
dc.citation.spage1451
dc.citation.volumeNumber247
dc.contributor.authorDreveton, Maximilien
dc.contributor.authorGözeten, Alperen
dc.contributor.authorGrossglauser, Matthias
dc.contributor.authorThiran, Patrick
dc.contributor.editorAgrawan S., Roth A.
dc.coverage.spatialEdmonton, Canada
dc.date.accessioned2025-02-28T06:34:29Z
dc.date.available2025-02-28T06:34:29Z
dc.date.issued2024-07-03
dc.departmentDepartment of Computer Engineering
dc.descriptionConference Name: 37th Annual Conference on Learning Theory, COLT 2024
dc.descriptionDate of Conference: 30 June 2024 - 3 July 2024
dc.description.abstractClustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through Chernoff information, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.
dc.identifier.doi10.48550/arXiv.2402.15432
dc.identifier.issn26403498
dc.identifier.urihttps://hdl.handle.net/11693/116961
dc.language.isoEnglish
dc.publisherML Research Press
dc.relation.isversionofhttps://dx.doi.org/10.48550/arXiv.2402.15432
dc.subjectClustering
dc.subjectIterative algorithms
dc.subjectK-means
dc.subjectMixture models
dc.titleUniversal lower bounds and optimal rates: achieving minimax clustering error in sub-exponential mixture models
dc.typeConference Paper

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Universal_Lower_Bounds_and_Optimal_Rates_Achieving_Minimax_Clustering_Error_in_Sub-Exponential_Mixture_Models.pdf
Size:
389.39 KB
Format:
Adobe Portable Document Format