Data decomposition techniques for parallel tree-based k-means clustering
Author
Şen, Cenk
Advisor
Gürsoy, Atilla
Date
2002Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
98
views
views
24
downloads
downloads
Abstract
The main computation in the k-means clustering is distance calculations between
cluster centroids and patterns. As the number of the patterns and the number of centroids
increases, time needed to complete computations increased. This computational load requires
high performance computers and/or algorithmic improvements. The parallel tree-based
k-means algorithm on distributed memory machines combines the algorithmic improvements
and high computation capacity of the parallel computers to deal with huge datasets. Its
performance is affected by the data decomposition technique used. In this thesis, we presented
novel data decomposition technique to improve the performance of the parallel tree-based
k-means algorithm on distributed memory machines. Proposed tree-based decomposition
techniques try to decrease the total number of the distance calculations by assigning
processors compact subspaces. The compact subspace improves the performance of the
pruning function of the tree-based k-means algorithm. We have implemented the algorithm
and have conducted experiments on a PC cluster. Our experimental results demonstrated that
the tree-based decomposition technique outperforms the random decomposition and stripwise
decomposition techniques.