Parallel sequence mining on distributed- memory systems

buir.advisorGürsoy, Atilla
dc.contributor.authorKarapınar, Embiya
dc.date.accessioned2016-07-01T11:11:37Z
dc.date.available2016-07-01T11:11:37Z
dc.date.issued2001
dc.descriptionCataloged from PDF version of article.en_US
dc.description.abstractDiscovering all the frequent sequences in very large databases is a time consuming task. However, large databases forces to partition the original database into chunks of data to process in main-memory. Most current algorithms require as many database scans as the longest frequent sequences. Spade is a fast algorithm which reduces the number of database scans to three by using lattice-theoretic approach to decompose origional problem into small pieces(equivalence classes) which can be processed in main-memory independently. In this thesis work, we present dSpade, a parallel algorithm, based on Spade, for discov- ering the set of all frequent sequences, targeting distributed-memory systems. In dSpade, horizontal database partitioning method is used, where each processor stores equal number of customer transactions. dSpade is a synchronous algorithm for discovering frequent 1-sequences (F1) and frequent 2-sequences ( F2). Each processor performs the same computation on its local data to get local support counts and broadcasts the results to other processors to nd global frequent sequences during F1 and F2 computation. After discovering all F1 and F2, all frequent sequences are inserted into lattice to decompose the original problem into equivalence classes. Equivalence classes are mapped in a greedy heuristic to the least loaded processors in a roundrobin manner. Finally, each processor asynchronously begins to compute Fk on its mapped equivalence classes to nd all frequent sequences. We present results of performance experiments conducted on a 32-node Beowulf Cluster. Experiments show that dSpade delivers good speedup and scales linearly in the database size.en_US
dc.description.provenanceMade available in DSpace on 2016-07-01T11:11:37Z (GMT). No. of bitstreams: 1 0001607.pdf: 476592 bytes, checksum: 8a3f79050e1eea28bfc2958dd5d28412 (MD5) Previous issue date: 2001en
dc.description.statementofresponsibilityKarapınar, Embiyaen_US
dc.format.extent50 leavesen_US
dc.identifier.itemidBILKUTUPB056066
dc.identifier.urihttp://hdl.handle.net/11693/30067
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectLatticeen_US
dc.subjectequivalence classen_US
dc.subjecthorizontal database partitioning methoden_US
dc.subject.lccQA76.9.D343 K37 2001en_US
dc.subject.lcshData mining.en_US
dc.titleParallel sequence mining on distributed- memory systemsen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0001607.pdf
Size:
465.42 KB
Format:
Adobe Portable Document Format
Description:
Full printable version