Efficient parallel frequency mining based on a novel top-down partitioning scheme for transactional data

buir.advisorAykanat, Cevdet
dc.contributor.authorÖzkural, Eray
dc.date.accessioned2016-01-08T18:18:17Z
dc.date.available2016-01-08T18:18:17Z
dc.date.copyright2002
dc.date.issued2002
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2002.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2002.en_US
dc.descriptionIncludes bibliographical references (leaves 91-99).en_US
dc.descriptionCataloged from PDF version of article.
dc.description.abstractIn recent years, large quantities of data have been amassed with advances in data acquisition capabilities. Automated detection of useful information is required for vast data obtained from scientific and business domains. Data Mining is the application of efficient algorithmic solutions on a variety of immense data for such knowledge discovery. Frequency mining discovers all frequent patterns in a transaction or relational database and it comprises the core of several data mining algorithms such as association rule mining and sequence mining. Frequent pattern discovery has become a challenge for parallel programming since it is a highly complex operation on huge datasets demanding efficient and scalable algorithms. In this thesis, we propose a new family of parallel frequency mining algo rithms. We introduce a novel transaction set partitioning scheme that can be used to divide the frequency mining task in a top-down fashion. The method op erates on the graph of frequent patterns with length two (Gp2) from which a graph partitioning by vertex separator (GPVS) is mapped to a two-way partitioning on the transaction set. The two parts obtained can be mined independently and therefore can be utilized for concurrency. In order for this property to hold, there is an amount of replication dictated by the separator in Gp2 which is minimized by the GPVS algorithm. A k-way partitioning is derived from recursive applica tion of 2- way partitioning scheme which is used in the design of a generic parallel frequency mining algorithm. First we compute Gp2 in parallel, succeeding that we designate a k-way partitioning of the database for k processors with a parallel IVrecursive procedure. The database is redistributed such, that each processor is as signed one part. Subsequent mining proceeds simultaneously and independently at each processor with a given serial mining algorithm. A complete implemen tation in which we employ FP- Growth as the sequential algorithm has been achieved. The performance study of the algorithm on a Beowulf system demon strates favorable performance for synthetic databases. For hard instances of the problem, we have gained approximately twice the speedup of a state-of-the-art algorithm. We also present a correction and optimization to FP- Growth algorithm.
dc.description.provenanceMade available in DSpace on 2016-01-08T18:18:17Z (GMT). No. of bitstreams: 1 0002014.pdf: 3835234 bytes, checksum: 9fad4c1c5203a597dc28f2a878cc1471 (MD5)en
dc.description.statementofresponsibilityby Eray Özkuralen_US
dc.format.extentxvii, 106 leaves ; 30 cm.en_US
dc.identifier.itemidBILKUTUPB062533
dc.identifier.urihttp://hdl.handle.net/11693/15419
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectParallel data mining
dc.subjectFrequency mining
dc.titleEfficient parallel frequency mining based on a novel top-down partitioning scheme for transactional dataen_US
dc.title.alternativeYeni bir işlem verisi parçalama şeması tabanlı etkin bir paralel frekans tarama
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0002014.pdf
Size:
3.66 MB
Format:
Adobe Portable Document Format