Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

Özkural, Eray

Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

buir.advisor	Aykanat, Cevdet
dc.contributor.author	Özkural, Eray
dc.date.accessioned	2016-01-08T18:18:17Z
dc.date.available	2016-01-08T18:18:17Z
dc.date.copyright	2002
dc.date.issued	2002
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 91-99).	en_US
dc.description.abstract	In recent years, large quantities of data have been amassed with advances in data acquisition capabilities. Automated detection of useful information is required for vast data obtained from scientific and business domains. Data Mining is the application of efficient algorithmic solutions on a variety of immense data for such knowledge discovery. Frequency mining discovers all frequent patterns in a transaction or relational database and it comprises the core of several data mining algorithms such as association rule mining and sequence mining. Frequent pattern discovery has become a challenge for parallel programming since it is a highly complex operation on huge datasets demanding efficient and scalable algorithms. In this thesis, we propose a new family of parallel frequency mining algo rithms. We introduce a novel transaction set partitioning scheme that can be used to divide the frequency mining task in a top-down fashion. The method op erates on the graph of frequent patterns with length two (Gp2) from which a graph partitioning by vertex separator (GPVS) is mapped to a two-way partitioning on the transaction set. The two parts obtained can be mined independently and therefore can be utilized for concurrency. In order for this property to hold, there is an amount of replication dictated by the separator in Gp2 which is minimized by the GPVS algorithm. A k-way partitioning is derived from recursive applica tion of 2- way partitioning scheme which is used in the design of a generic parallel frequency mining algorithm. First we compute Gp2 in parallel, succeeding that we designate a k-way partitioning of the database for k processors with a parallel IVrecursive procedure. The database is redistributed such, that each processor is as signed one part. Subsequent mining proceeds simultaneously and independently at each processor with a given serial mining algorithm. A complete implemen tation in which we employ FP- Growth as the sequential algorithm has been achieved. The performance study of the algorithm on a Beowulf system demon strates favorable performance for synthetic databases. For hard instances of the problem, we have gained approximately twice the speedup of a state-of-the-art algorithm. We also present a correction and optimization to FP- Growth algorithm.
dc.description.statementofresponsibility	by Eray Özkural	en_US
dc.format.extent	xvii, 106 leaves ; 30 cm.	en_US
dc.identifier.itemid	BILKUTUPB062533
dc.identifier.uri	http://hdl.handle.net/11693/15419
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Parallel data mining
dc.subject	Frequency mining
dc.title	Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data	en_US
dc.title.alternative	Yeni bir işlem verisi parçalama şeması tabanlı etkin bir paralel frekans tarama
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0002014.pdf
Size:: 3.66 MB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science