Data decomposition techniques for parallel tree-based k-means clustering

Şen, Cenk

Data decomposition techniques for parallel tree-based k-means clustering

Files

0002066.pdf (1.75 MB)

Date

2002

Authors

Şen, Cenk

Advisor

Gürsoy, Atilla

BUIR Usage Stats

3
views

18
downloads

Abstract

The main computation in the k-means clustering is distance calculations between cluster centroids and patterns. As the number of the patterns and the number of centroids increases, time needed to complete computations increased. This computational load requires high performance computers and/or algorithmic improvements. The parallel tree-based k-means algorithm on distributed memory machines combines the algorithmic improvements and high computation capacity of the parallel computers to deal with huge datasets. Its performance is affected by the data decomposition technique used. In this thesis, we presented novel data decomposition technique to improve the performance of the parallel tree-based k-means algorithm on distributed memory machines. Proposed tree-based decomposition techniques try to decrease the total number of the distance calculations by assigning processors compact subspaces. The compact subspace improves the performance of the pruning function of the tree-based k-means algorithm. We have implemented the algorithm and have conducted experiments on a PC cluster. Our experimental results demonstrated that the tree-based decomposition technique outperforms the random decomposition and stripwise decomposition techniques.

Keywords

Clustering, Parallel algorithm, Load balancing, Data decomposition

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/15853

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Data decomposition techniques for parallel tree-based k-means clustering

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Data decomposition techniques for parallel tree-based k-means clustering

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type