Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

Özkural, Eray

Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

Files

0002014.pdf (3.66 MB)

Date

2002

Authors

Özkural, Eray

Advisor

Aykanat, Cevdet

BUIR Usage Stats

10
views

19
downloads

Abstract

In recent years, large quantities of data have been amassed with advances in data acquisition capabilities. Automated detection of useful information is required for vast data obtained from scientific and business domains. Data Mining is the application of efficient algorithmic solutions on a variety of immense data for such knowledge discovery. Frequency mining discovers all frequent patterns in a transaction or relational database and it comprises the core of several data mining algorithms such as association rule mining and sequence mining. Frequent pattern discovery has become a challenge for parallel programming since it is a highly complex operation on huge datasets demanding efficient and scalable algorithms. In this thesis, we propose a new family of parallel frequency mining algo rithms. We introduce a novel transaction set partitioning scheme that can be used to divide the frequency mining task in a top-down fashion. The method op erates on the graph of frequent patterns with length two (Gp2) from which a graph partitioning by vertex separator (GPVS) is mapped to a two-way partitioning on the transaction set. The two parts obtained can be mined independently and therefore can be utilized for concurrency. In order for this property to hold, there is an amount of replication dictated by the separator in Gp2 which is minimized by the GPVS algorithm. A k-way partitioning is derived from recursive applica tion of 2- way partitioning scheme which is used in the design of a generic parallel frequency mining algorithm. First we compute Gp2 in parallel, succeeding that we designate a k-way partitioning of the database for k processors with a parallel IVrecursive procedure. The database is redistributed such, that each processor is as signed one part. Subsequent mining proceeds simultaneously and independently at each processor with a given serial mining algorithm. A complete implemen tation in which we employ FP- Growth as the sequential algorithm has been achieved. The performance study of the algorithm on a Beowulf system demon strates favorable performance for synthetic databases. For hard instances of the problem, we have gained approximately twice the speedup of a state-of-the-art algorithm. We also present a correction and optimization to FP- Growth algorithm.

Keywords

Parallel data mining, Frequency mining

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/15419

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type